{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1导入数据\n",
    "# 2数据探索预处理\n",
    "## 2.1根据数据制造我们要预测出来的特征 订单欺诈fraud和订单延迟late_delivery\n",
    "## 2.2对特征进行选择\n",
    "## 2.3去除类别较少的特征\n",
    "## 2.4去除一些跟预测结果无关的特征\n",
    "## 2.5去除导致标签泄漏的特征\n",
    "## 2.6处理非数值型的特征\n",
    "# 3fraud欺诈预测 准备特征和标签 \n",
    "# 4数据切分和模型预测\n",
    "# 5late_delivery订单延期预测（类似仿照步骤3-4）\n",
    "# 6其他模型进行预测\n",
    "## 6.1高斯朴素贝叶斯\n",
    "## 6.2伯努利朴素贝叶斯\n",
    "## 6.3 SVM\n",
    "## 6.4决策树\n",
    "## 6.5神经网络keras\n",
    "## 6.6模型融合\n",
    "## 6.7随机森林\n",
    "## 6.8Xgboost\n",
    "# 7回归任务：对销售额Sales以及订单数量Order Item Quantity预测\n",
    "## 7.1准备特征和标签\n",
    "## 7.2数据集切分\n",
    "## 7.3模型预测和评估\n",
    "### 7.3.1线性回归\n",
    "### 7.3.2Lasso回归\n",
    "### 7.3.3Ridge回归\n",
    "### 7.3.4回归树\n",
    "### 7.3.5Xgboost\n",
    "### 7.3.6lightgbm\n",
    "### 7.3.7随机森林\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1导入数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Type</th>\n",
       "      <th>Days for shipping (real)</th>\n",
       "      <th>Days for shipment (scheduled)</th>\n",
       "      <th>Benefit per order</th>\n",
       "      <th>Sales per customer</th>\n",
       "      <th>Delivery Status</th>\n",
       "      <th>Late_delivery_risk</th>\n",
       "      <th>Category Id</th>\n",
       "      <th>Category Name</th>\n",
       "      <th>Customer City</th>\n",
       "      <th>Customer Country</th>\n",
       "      <th>Customer Email</th>\n",
       "      <th>Customer Fname</th>\n",
       "      <th>Customer Id</th>\n",
       "      <th>Customer Lname</th>\n",
       "      <th>Customer Password</th>\n",
       "      <th>Customer Segment</th>\n",
       "      <th>Customer State</th>\n",
       "      <th>Customer Street</th>\n",
       "      <th>Customer Zipcode</th>\n",
       "      <th>Department Id</th>\n",
       "      <th>Department Name</th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "      <th>Market</th>\n",
       "      <th>Order City</th>\n",
       "      <th>Order Country</th>\n",
       "      <th>Order Customer Id</th>\n",
       "      <th>order date (DateOrders)</th>\n",
       "      <th>Order Id</th>\n",
       "      <th>Order Item Cardprod Id</th>\n",
       "      <th>Order Item Discount</th>\n",
       "      <th>Order Item Discount Rate</th>\n",
       "      <th>Order Item Id</th>\n",
       "      <th>Order Item Product Price</th>\n",
       "      <th>Order Item Profit Ratio</th>\n",
       "      <th>Order Item Quantity</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Order Item Total</th>\n",
       "      <th>Order Profit Per Order</th>\n",
       "      <th>Order Region</th>\n",
       "      <th>Order State</th>\n",
       "      <th>Order Status</th>\n",
       "      <th>Order Zipcode</th>\n",
       "      <th>Product Card Id</th>\n",
       "      <th>Product Category Id</th>\n",
       "      <th>Product Description</th>\n",
       "      <th>Product Image</th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Product Price</th>\n",
       "      <th>Product Status</th>\n",
       "      <th>shipping date (DateOrders)</th>\n",
       "      <th>Shipping Mode</th>\n",
       "      <th>Customer Full Name</th>\n",
       "      <th>order_year</th>\n",
       "      <th>order_month</th>\n",
       "      <th>order_weekday</th>\n",
       "      <th>order_hour</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>91.250000</td>\n",
       "      <td>314.640015</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Cally</td>\n",
       "      <td>20755</td>\n",
       "      <td>Holloway</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>5365 Noble Nectar Island</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.251453</td>\n",
       "      <td>-66.037056</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bekasi</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>20755</td>\n",
       "      <td>2018-01-31 22:56:00</td>\n",
       "      <td>77202</td>\n",
       "      <td>1360</td>\n",
       "      <td>13.110000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>180517</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.29</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>314.640015</td>\n",
       "      <td>91.250000</td>\n",
       "      <td>Southeast Asia</td>\n",
       "      <td>Java Occidental</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>2/3/2018 22:56</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HollowayCally</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>-249.089996</td>\n",
       "      <td>311.359985</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Irene</td>\n",
       "      <td>19492</td>\n",
       "      <td>Luna</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2679 Rustic Loop</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.279451</td>\n",
       "      <td>-66.037064</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>19492</td>\n",
       "      <td>2018-01-13 12:27:00</td>\n",
       "      <td>75939</td>\n",
       "      <td>1360</td>\n",
       "      <td>16.389999</td>\n",
       "      <td>0.05</td>\n",
       "      <td>179254</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>311.359985</td>\n",
       "      <td>-249.089996</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/18/2018 12:27</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LunaIrene</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>-247.779999</td>\n",
       "      <td>309.720001</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Gillian</td>\n",
       "      <td>19491</td>\n",
       "      <td>Maldonado</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>CA</td>\n",
       "      <td>8510 Round Bear Gate</td>\n",
       "      <td>95125.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>37.292233</td>\n",
       "      <td>-121.881279</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>19491</td>\n",
       "      <td>2018-01-13 12:06:00</td>\n",
       "      <td>75938</td>\n",
       "      <td>1360</td>\n",
       "      <td>18.030001</td>\n",
       "      <td>0.06</td>\n",
       "      <td>179253</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>309.720001</td>\n",
       "      <td>-247.779999</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/17/2018 12:06</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>MaldonadoGillian</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>22.860001</td>\n",
       "      <td>304.809998</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Tana</td>\n",
       "      <td>19490</td>\n",
       "      <td>Tate</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>CA</td>\n",
       "      <td>3200 Amber Bend</td>\n",
       "      <td>90027.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>34.125946</td>\n",
       "      <td>-118.291016</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>19490</td>\n",
       "      <td>2018-01-13 11:45:00</td>\n",
       "      <td>75937</td>\n",
       "      <td>1360</td>\n",
       "      <td>22.940001</td>\n",
       "      <td>0.07</td>\n",
       "      <td>179252</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.08</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>304.809998</td>\n",
       "      <td>22.860001</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/16/2018 11:45</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>TateTana</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>134.210007</td>\n",
       "      <td>298.250000</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Orli</td>\n",
       "      <td>19489</td>\n",
       "      <td>Hendricks</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>PR</td>\n",
       "      <td>8671 Iron Anchor Corners</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.253769</td>\n",
       "      <td>-66.037048</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>19489</td>\n",
       "      <td>2018-01-13 11:24:00</td>\n",
       "      <td>75936</td>\n",
       "      <td>1360</td>\n",
       "      <td>29.500000</td>\n",
       "      <td>0.09</td>\n",
       "      <td>179251</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.45</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>298.250000</td>\n",
       "      <td>134.210007</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/15/2018 11:24</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HendricksOrli</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180514</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>40.000000</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Maria</td>\n",
       "      <td>1005</td>\n",
       "      <td>Peterson</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>NY</td>\n",
       "      <td>1322 Broad Glade</td>\n",
       "      <td>11207.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>40.640930</td>\n",
       "      <td>-73.942711</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>China</td>\n",
       "      <td>1005</td>\n",
       "      <td>2016-01-16 03:40:00</td>\n",
       "      <td>26043</td>\n",
       "      <td>1004</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.00</td>\n",
       "      <td>65177</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.10</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>40.000000</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/20/2016 3:40</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PetersonMaria</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180515</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>-613.770019</td>\n",
       "      <td>395.980011</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bakersfield</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Ronald</td>\n",
       "      <td>9141</td>\n",
       "      <td>Clark</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CA</td>\n",
       "      <td>7330 Broad Apple Moor</td>\n",
       "      <td>93304.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>35.362545</td>\n",
       "      <td>-119.018700</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Hirakata</td>\n",
       "      <td>Japón</td>\n",
       "      <td>9141</td>\n",
       "      <td>2016-01-16 01:34:00</td>\n",
       "      <td>26037</td>\n",
       "      <td>1004</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.01</td>\n",
       "      <td>65161</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>-1.55</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>395.980011</td>\n",
       "      <td>-613.770019</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Osaka</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/19/2016 1:34</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>ClarkRonald</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180516</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>141.110001</td>\n",
       "      <td>391.980011</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bristol</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>John</td>\n",
       "      <td>291</td>\n",
       "      <td>Smith</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CT</td>\n",
       "      <td>97 Burning Landing</td>\n",
       "      <td>6010.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>41.629959</td>\n",
       "      <td>-72.967155</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>291</td>\n",
       "      <td>2016-01-15 21:00:00</td>\n",
       "      <td>26024</td>\n",
       "      <td>1004</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>0.02</td>\n",
       "      <td>65129</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.36</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>391.980011</td>\n",
       "      <td>141.110001</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/20/2016 21:00</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithJohn</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180517</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>186.229996</td>\n",
       "      <td>387.980011</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Mary</td>\n",
       "      <td>2813</td>\n",
       "      <td>Smith</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2585 Silent Autumn Landing</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.213350</td>\n",
       "      <td>-66.370575</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2813</td>\n",
       "      <td>2016-01-15 20:18:00</td>\n",
       "      <td>26022</td>\n",
       "      <td>1004</td>\n",
       "      <td>12.000000</td>\n",
       "      <td>0.03</td>\n",
       "      <td>65126</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.48</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>387.980011</td>\n",
       "      <td>186.229996</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/18/2016 20:18</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithMary</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180518</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>168.949997</td>\n",
       "      <td>383.980011</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Andrea</td>\n",
       "      <td>7547</td>\n",
       "      <td>Ortega</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>697 Little Meadow</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.290380</td>\n",
       "      <td>-66.370613</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Nagercoil</td>\n",
       "      <td>India</td>\n",
       "      <td>7547</td>\n",
       "      <td>2016-01-15 18:54:00</td>\n",
       "      <td>26018</td>\n",
       "      <td>1004</td>\n",
       "      <td>16.000000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>65113</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.44</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>383.980011</td>\n",
       "      <td>168.949997</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Tamil Nadu</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/19/2016 18:54</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>OrtegaAndrea</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>180519 rows × 58 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Type  Days for shipping (real)  Days for shipment (scheduled)  \\\n",
       "0          DEBIT                         3                              4   \n",
       "1       TRANSFER                         5                              4   \n",
       "2           CASH                         4                              4   \n",
       "3          DEBIT                         3                              4   \n",
       "4        PAYMENT                         2                              4   \n",
       "...          ...                       ...                            ...   \n",
       "180514      CASH                         4                              4   \n",
       "180515     DEBIT                         3                              2   \n",
       "180516  TRANSFER                         5                              4   \n",
       "180517   PAYMENT                         3                              4   \n",
       "180518   PAYMENT                         4                              4   \n",
       "\n",
       "        Benefit per order  Sales per customer   Delivery Status  \\\n",
       "0               91.250000          314.640015  Advance shipping   \n",
       "1             -249.089996          311.359985     Late delivery   \n",
       "2             -247.779999          309.720001  Shipping on time   \n",
       "3               22.860001          304.809998  Advance shipping   \n",
       "4              134.210007          298.250000  Advance shipping   \n",
       "...                   ...                 ...               ...   \n",
       "180514          40.000000          399.980011  Shipping on time   \n",
       "180515        -613.770019          395.980011     Late delivery   \n",
       "180516         141.110001          391.980011     Late delivery   \n",
       "180517         186.229996          387.980011  Advance shipping   \n",
       "180518         168.949997          383.980011  Shipping on time   \n",
       "\n",
       "        Late_delivery_risk  Category Id   Category Name Customer City  \\\n",
       "0                        0           73  Sporting Goods        Caguas   \n",
       "1                        1           73  Sporting Goods        Caguas   \n",
       "2                        0           73  Sporting Goods      San Jose   \n",
       "3                        0           73  Sporting Goods   Los Angeles   \n",
       "4                        0           73  Sporting Goods        Caguas   \n",
       "...                    ...          ...             ...           ...   \n",
       "180514                   0           45         Fishing      Brooklyn   \n",
       "180515                   1           45         Fishing   Bakersfield   \n",
       "180516                   1           45         Fishing       Bristol   \n",
       "180517                   0           45         Fishing        Caguas   \n",
       "180518                   0           45         Fishing        Caguas   \n",
       "\n",
       "       Customer Country Customer Email Customer Fname  Customer Id  \\\n",
       "0           Puerto Rico      XXXXXXXXX          Cally        20755   \n",
       "1           Puerto Rico      XXXXXXXXX          Irene        19492   \n",
       "2               EE. UU.      XXXXXXXXX        Gillian        19491   \n",
       "3               EE. UU.      XXXXXXXXX           Tana        19490   \n",
       "4           Puerto Rico      XXXXXXXXX           Orli        19489   \n",
       "...                 ...            ...            ...          ...   \n",
       "180514          EE. UU.      XXXXXXXXX          Maria         1005   \n",
       "180515          EE. UU.      XXXXXXXXX         Ronald         9141   \n",
       "180516          EE. UU.      XXXXXXXXX           John          291   \n",
       "180517      Puerto Rico      XXXXXXXXX           Mary         2813   \n",
       "180518      Puerto Rico      XXXXXXXXX         Andrea         7547   \n",
       "\n",
       "       Customer Lname Customer Password Customer Segment Customer State  \\\n",
       "0            Holloway         XXXXXXXXX         Consumer             PR   \n",
       "1                Luna         XXXXXXXXX         Consumer             PR   \n",
       "2           Maldonado         XXXXXXXXX         Consumer             CA   \n",
       "3                Tate         XXXXXXXXX      Home Office             CA   \n",
       "4           Hendricks         XXXXXXXXX        Corporate             PR   \n",
       "...               ...               ...              ...            ...   \n",
       "180514       Peterson         XXXXXXXXX      Home Office             NY   \n",
       "180515          Clark         XXXXXXXXX        Corporate             CA   \n",
       "180516          Smith         XXXXXXXXX        Corporate             CT   \n",
       "180517          Smith         XXXXXXXXX         Consumer             PR   \n",
       "180518         Ortega         XXXXXXXXX         Consumer             PR   \n",
       "\n",
       "                   Customer Street  Customer Zipcode  Department Id  \\\n",
       "0         5365 Noble Nectar Island             725.0              2   \n",
       "1                 2679 Rustic Loop             725.0              2   \n",
       "2             8510 Round Bear Gate           95125.0              2   \n",
       "3                  3200 Amber Bend           90027.0              2   \n",
       "4         8671 Iron Anchor Corners             725.0              2   \n",
       "...                            ...               ...            ...   \n",
       "180514            1322 Broad Glade           11207.0              7   \n",
       "180515       7330 Broad Apple Moor           93304.0              7   \n",
       "180516          97 Burning Landing            6010.0              7   \n",
       "180517  2585 Silent Autumn Landing             725.0              7   \n",
       "180518           697 Little Meadow             725.0              7   \n",
       "\n",
       "       Department Name   Latitude   Longitude        Market  Order City  \\\n",
       "0              Fitness  18.251453  -66.037056  Pacific Asia      Bekasi   \n",
       "1              Fitness  18.279451  -66.037064  Pacific Asia     Bikaner   \n",
       "2              Fitness  37.292233 -121.881279  Pacific Asia     Bikaner   \n",
       "3              Fitness  34.125946 -118.291016  Pacific Asia  Townsville   \n",
       "4              Fitness  18.253769  -66.037048  Pacific Asia  Townsville   \n",
       "...                ...        ...         ...           ...         ...   \n",
       "180514        Fan Shop  40.640930  -73.942711  Pacific Asia    Shanghái   \n",
       "180515        Fan Shop  35.362545 -119.018700  Pacific Asia    Hirakata   \n",
       "180516        Fan Shop  41.629959  -72.967155  Pacific Asia    Adelaide   \n",
       "180517        Fan Shop  18.213350  -66.370575  Pacific Asia    Adelaide   \n",
       "180518        Fan Shop  18.290380  -66.370613  Pacific Asia   Nagercoil   \n",
       "\n",
       "       Order Country  Order Customer Id order date (DateOrders)  Order Id  \\\n",
       "0          Indonesia              20755     2018-01-31 22:56:00     77202   \n",
       "1              India              19492     2018-01-13 12:27:00     75939   \n",
       "2              India              19491     2018-01-13 12:06:00     75938   \n",
       "3          Australia              19490     2018-01-13 11:45:00     75937   \n",
       "4          Australia              19489     2018-01-13 11:24:00     75936   \n",
       "...              ...                ...                     ...       ...   \n",
       "180514         China               1005     2016-01-16 03:40:00     26043   \n",
       "180515         Japón               9141     2016-01-16 01:34:00     26037   \n",
       "180516     Australia                291     2016-01-15 21:00:00     26024   \n",
       "180517     Australia               2813     2016-01-15 20:18:00     26022   \n",
       "180518         India               7547     2016-01-15 18:54:00     26018   \n",
       "\n",
       "        Order Item Cardprod Id  Order Item Discount  Order Item Discount Rate  \\\n",
       "0                         1360            13.110000                      0.04   \n",
       "1                         1360            16.389999                      0.05   \n",
       "2                         1360            18.030001                      0.06   \n",
       "3                         1360            22.940001                      0.07   \n",
       "4                         1360            29.500000                      0.09   \n",
       "...                        ...                  ...                       ...   \n",
       "180514                    1004             0.000000                      0.00   \n",
       "180515                    1004             4.000000                      0.01   \n",
       "180516                    1004             8.000000                      0.02   \n",
       "180517                    1004            12.000000                      0.03   \n",
       "180518                    1004            16.000000                      0.04   \n",
       "\n",
       "        Order Item Id  Order Item Product Price  Order Item Profit Ratio  \\\n",
       "0              180517                327.750000                     0.29   \n",
       "1              179254                327.750000                    -0.80   \n",
       "2              179253                327.750000                    -0.80   \n",
       "3              179252                327.750000                     0.08   \n",
       "4              179251                327.750000                     0.45   \n",
       "...               ...                       ...                      ...   \n",
       "180514          65177                399.980011                     0.10   \n",
       "180515          65161                399.980011                    -1.55   \n",
       "180516          65129                399.980011                     0.36   \n",
       "180517          65126                399.980011                     0.48   \n",
       "180518          65113                399.980011                     0.44   \n",
       "\n",
       "        Order Item Quantity       Sales  Order Item Total  \\\n",
       "0                         1  327.750000        314.640015   \n",
       "1                         1  327.750000        311.359985   \n",
       "2                         1  327.750000        309.720001   \n",
       "3                         1  327.750000        304.809998   \n",
       "4                         1  327.750000        298.250000   \n",
       "...                     ...         ...               ...   \n",
       "180514                    1  399.980011        399.980011   \n",
       "180515                    1  399.980011        395.980011   \n",
       "180516                    1  399.980011        391.980011   \n",
       "180517                    1  399.980011        387.980011   \n",
       "180518                    1  399.980011        383.980011   \n",
       "\n",
       "        Order Profit Per Order    Order Region        Order State  \\\n",
       "0                    91.250000  Southeast Asia    Java Occidental   \n",
       "1                  -249.089996      South Asia           Rajastán   \n",
       "2                  -247.779999      South Asia           Rajastán   \n",
       "3                    22.860001         Oceania         Queensland   \n",
       "4                   134.210007         Oceania         Queensland   \n",
       "...                        ...             ...                ...   \n",
       "180514               40.000000    Eastern Asia           Shanghái   \n",
       "180515             -613.770019    Eastern Asia              Osaka   \n",
       "180516              141.110001         Oceania  Australia del Sur   \n",
       "180517              186.229996         Oceania  Australia del Sur   \n",
       "180518              168.949997      South Asia         Tamil Nadu   \n",
       "\n",
       "           Order Status  Order Zipcode  Product Card Id  Product Category Id  \\\n",
       "0              COMPLETE            NaN             1360                   73   \n",
       "1               PENDING            NaN             1360                   73   \n",
       "2                CLOSED            NaN             1360                   73   \n",
       "3              COMPLETE            NaN             1360                   73   \n",
       "4       PENDING_PAYMENT            NaN             1360                   73   \n",
       "...                 ...            ...              ...                  ...   \n",
       "180514           CLOSED            NaN             1004                   45   \n",
       "180515         COMPLETE            NaN             1004                   45   \n",
       "180516          PENDING            NaN             1004                   45   \n",
       "180517  PENDING_PAYMENT            NaN             1004                   45   \n",
       "180518  PENDING_PAYMENT            NaN             1004                   45   \n",
       "\n",
       "        Product Description  \\\n",
       "0                       NaN   \n",
       "1                       NaN   \n",
       "2                       NaN   \n",
       "3                       NaN   \n",
       "4                       NaN   \n",
       "...                     ...   \n",
       "180514                  NaN   \n",
       "180515                  NaN   \n",
       "180516                  NaN   \n",
       "180517                  NaN   \n",
       "180518                  NaN   \n",
       "\n",
       "                                            Product Image  \\\n",
       "0            http://images.acmesports.sports/Smart+watch    \n",
       "1            http://images.acmesports.sports/Smart+watch    \n",
       "2            http://images.acmesports.sports/Smart+watch    \n",
       "3            http://images.acmesports.sports/Smart+watch    \n",
       "4            http://images.acmesports.sports/Smart+watch    \n",
       "...                                                   ...   \n",
       "180514  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180515  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180516  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180517  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180518  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "\n",
       "                                     Product Name  Product Price  \\\n",
       "0                                    Smart watch      327.750000   \n",
       "1                                    Smart watch      327.750000   \n",
       "2                                    Smart watch      327.750000   \n",
       "3                                    Smart watch      327.750000   \n",
       "4                                    Smart watch      327.750000   \n",
       "...                                           ...            ...   \n",
       "180514  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180515  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180516  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180517  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180518  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "\n",
       "        Product Status shipping date (DateOrders)   Shipping Mode  \\\n",
       "0                    0             2/3/2018 22:56  Standard Class   \n",
       "1                    0            1/18/2018 12:27  Standard Class   \n",
       "2                    0            1/17/2018 12:06  Standard Class   \n",
       "3                    0            1/16/2018 11:45  Standard Class   \n",
       "4                    0            1/15/2018 11:24  Standard Class   \n",
       "...                ...                        ...             ...   \n",
       "180514               0             1/20/2016 3:40  Standard Class   \n",
       "180515               0             1/19/2016 1:34    Second Class   \n",
       "180516               0            1/20/2016 21:00  Standard Class   \n",
       "180517               0            1/18/2016 20:18  Standard Class   \n",
       "180518               0            1/19/2016 18:54  Standard Class   \n",
       "\n",
       "       Customer Full Name  order_year  order_month  order_weekday  order_hour  \n",
       "0           HollowayCally        2018            1              2          22  \n",
       "1               LunaIrene        2018            1              5          12  \n",
       "2        MaldonadoGillian        2018            1              5          12  \n",
       "3                TateTana        2018            1              5          11  \n",
       "4           HendricksOrli        2018            1              5          11  \n",
       "...                   ...         ...          ...            ...         ...  \n",
       "180514      PetersonMaria        2016            1              5           3  \n",
       "180515        ClarkRonald        2016            1              5           1  \n",
       "180516          SmithJohn        2016            1              4          21  \n",
       "180517          SmithMary        2016            1              4          20  \n",
       "180518       OrtegaAndrea        2016            1              4          18  \n",
       "\n",
       "[180519 rows x 58 columns]"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#上一步保存了用pickle保存了pkl格式，这种格式比较方便、轻快，比CSV要小一些\n",
    "import pickle\n",
    "with open('mydata.pkl','rb') as file:\n",
    "    train_data=pickle.load(file)\n",
    "train_data    #这样就把上一步做过一些特征工程的数据给导入进来了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Type</th>\n",
       "      <th>Days for shipping (real)</th>\n",
       "      <th>Days for shipment (scheduled)</th>\n",
       "      <th>Benefit per order</th>\n",
       "      <th>Sales per customer</th>\n",
       "      <th>Delivery Status</th>\n",
       "      <th>Late_delivery_risk</th>\n",
       "      <th>Category Id</th>\n",
       "      <th>Category Name</th>\n",
       "      <th>Customer City</th>\n",
       "      <th>Customer Country</th>\n",
       "      <th>Customer Email</th>\n",
       "      <th>Customer Fname</th>\n",
       "      <th>Customer Id</th>\n",
       "      <th>Customer Lname</th>\n",
       "      <th>Customer Password</th>\n",
       "      <th>Customer Segment</th>\n",
       "      <th>Customer State</th>\n",
       "      <th>Customer Street</th>\n",
       "      <th>Customer Zipcode</th>\n",
       "      <th>Department Id</th>\n",
       "      <th>Department Name</th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "      <th>Market</th>\n",
       "      <th>Order City</th>\n",
       "      <th>Order Country</th>\n",
       "      <th>Order Customer Id</th>\n",
       "      <th>order date (DateOrders)</th>\n",
       "      <th>Order Id</th>\n",
       "      <th>Order Item Cardprod Id</th>\n",
       "      <th>Order Item Discount</th>\n",
       "      <th>Order Item Discount Rate</th>\n",
       "      <th>Order Item Id</th>\n",
       "      <th>Order Item Product Price</th>\n",
       "      <th>Order Item Profit Ratio</th>\n",
       "      <th>Order Item Quantity</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Order Item Total</th>\n",
       "      <th>Order Profit Per Order</th>\n",
       "      <th>Order Region</th>\n",
       "      <th>Order State</th>\n",
       "      <th>Order Status</th>\n",
       "      <th>Order Zipcode</th>\n",
       "      <th>Product Card Id</th>\n",
       "      <th>Product Category Id</th>\n",
       "      <th>Product Description</th>\n",
       "      <th>Product Image</th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Product Price</th>\n",
       "      <th>Product Status</th>\n",
       "      <th>shipping date (DateOrders)</th>\n",
       "      <th>Shipping Mode</th>\n",
       "      <th>Customer Full Name</th>\n",
       "      <th>order_year</th>\n",
       "      <th>order_month</th>\n",
       "      <th>order_weekday</th>\n",
       "      <th>order_hour</th>\n",
       "      <th>order_month_year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>91.250000</td>\n",
       "      <td>314.640015</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Cally</td>\n",
       "      <td>20755</td>\n",
       "      <td>Holloway</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>5365 Noble Nectar Island</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.251453</td>\n",
       "      <td>-66.037056</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bekasi</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>20755</td>\n",
       "      <td>2018-01-31 22:56:00</td>\n",
       "      <td>77202</td>\n",
       "      <td>1360</td>\n",
       "      <td>13.110000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>180517</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.29</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>314.640015</td>\n",
       "      <td>91.250000</td>\n",
       "      <td>Southeast Asia</td>\n",
       "      <td>Java Occidental</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>2/3/2018 22:56</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HollowayCally</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>22</td>\n",
       "      <td>2018-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>-249.089996</td>\n",
       "      <td>311.359985</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Irene</td>\n",
       "      <td>19492</td>\n",
       "      <td>Luna</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2679 Rustic Loop</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.279451</td>\n",
       "      <td>-66.037064</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>19492</td>\n",
       "      <td>2018-01-13 12:27:00</td>\n",
       "      <td>75939</td>\n",
       "      <td>1360</td>\n",
       "      <td>16.389999</td>\n",
       "      <td>0.05</td>\n",
       "      <td>179254</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>311.359985</td>\n",
       "      <td>-249.089996</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/18/2018 12:27</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LunaIrene</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>2018-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>-247.779999</td>\n",
       "      <td>309.720001</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Gillian</td>\n",
       "      <td>19491</td>\n",
       "      <td>Maldonado</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>CA</td>\n",
       "      <td>8510 Round Bear Gate</td>\n",
       "      <td>95125.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>37.292233</td>\n",
       "      <td>-121.881279</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>19491</td>\n",
       "      <td>2018-01-13 12:06:00</td>\n",
       "      <td>75938</td>\n",
       "      <td>1360</td>\n",
       "      <td>18.030001</td>\n",
       "      <td>0.06</td>\n",
       "      <td>179253</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>309.720001</td>\n",
       "      <td>-247.779999</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/17/2018 12:06</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>MaldonadoGillian</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>2018-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>22.860001</td>\n",
       "      <td>304.809998</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Tana</td>\n",
       "      <td>19490</td>\n",
       "      <td>Tate</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>CA</td>\n",
       "      <td>3200 Amber Bend</td>\n",
       "      <td>90027.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>34.125946</td>\n",
       "      <td>-118.291016</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>19490</td>\n",
       "      <td>2018-01-13 11:45:00</td>\n",
       "      <td>75937</td>\n",
       "      <td>1360</td>\n",
       "      <td>22.940001</td>\n",
       "      <td>0.07</td>\n",
       "      <td>179252</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.08</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>304.809998</td>\n",
       "      <td>22.860001</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/16/2018 11:45</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>TateTana</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "      <td>2018-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>134.210007</td>\n",
       "      <td>298.250000</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Orli</td>\n",
       "      <td>19489</td>\n",
       "      <td>Hendricks</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>PR</td>\n",
       "      <td>8671 Iron Anchor Corners</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.253769</td>\n",
       "      <td>-66.037048</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>19489</td>\n",
       "      <td>2018-01-13 11:24:00</td>\n",
       "      <td>75936</td>\n",
       "      <td>1360</td>\n",
       "      <td>29.500000</td>\n",
       "      <td>0.09</td>\n",
       "      <td>179251</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.45</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>298.250000</td>\n",
       "      <td>134.210007</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1360</td>\n",
       "      <td>73</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Smart+watch</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0</td>\n",
       "      <td>1/15/2018 11:24</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HendricksOrli</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "      <td>2018-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180514</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>40.000000</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Maria</td>\n",
       "      <td>1005</td>\n",
       "      <td>Peterson</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>NY</td>\n",
       "      <td>1322 Broad Glade</td>\n",
       "      <td>11207.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>40.640930</td>\n",
       "      <td>-73.942711</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>China</td>\n",
       "      <td>1005</td>\n",
       "      <td>2016-01-16 03:40:00</td>\n",
       "      <td>26043</td>\n",
       "      <td>1004</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.00</td>\n",
       "      <td>65177</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.10</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>40.000000</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/20/2016 3:40</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PetersonMaria</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>2016-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180515</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>-613.770019</td>\n",
       "      <td>395.980011</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bakersfield</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Ronald</td>\n",
       "      <td>9141</td>\n",
       "      <td>Clark</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CA</td>\n",
       "      <td>7330 Broad Apple Moor</td>\n",
       "      <td>93304.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>35.362545</td>\n",
       "      <td>-119.018700</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Hirakata</td>\n",
       "      <td>Japón</td>\n",
       "      <td>9141</td>\n",
       "      <td>2016-01-16 01:34:00</td>\n",
       "      <td>26037</td>\n",
       "      <td>1004</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.01</td>\n",
       "      <td>65161</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>-1.55</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>395.980011</td>\n",
       "      <td>-613.770019</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Osaka</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/19/2016 1:34</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>ClarkRonald</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>2016-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180516</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>141.110001</td>\n",
       "      <td>391.980011</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bristol</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>John</td>\n",
       "      <td>291</td>\n",
       "      <td>Smith</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CT</td>\n",
       "      <td>97 Burning Landing</td>\n",
       "      <td>6010.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>41.629959</td>\n",
       "      <td>-72.967155</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>291</td>\n",
       "      <td>2016-01-15 21:00:00</td>\n",
       "      <td>26024</td>\n",
       "      <td>1004</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>0.02</td>\n",
       "      <td>65129</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.36</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>391.980011</td>\n",
       "      <td>141.110001</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/20/2016 21:00</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithJohn</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "      <td>2016-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180517</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>186.229996</td>\n",
       "      <td>387.980011</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Mary</td>\n",
       "      <td>2813</td>\n",
       "      <td>Smith</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2585 Silent Autumn Landing</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.213350</td>\n",
       "      <td>-66.370575</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2813</td>\n",
       "      <td>2016-01-15 20:18:00</td>\n",
       "      <td>26022</td>\n",
       "      <td>1004</td>\n",
       "      <td>12.000000</td>\n",
       "      <td>0.03</td>\n",
       "      <td>65126</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.48</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>387.980011</td>\n",
       "      <td>186.229996</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/18/2016 20:18</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithMary</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "      <td>2016-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180518</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>168.949997</td>\n",
       "      <td>383.980011</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Andrea</td>\n",
       "      <td>7547</td>\n",
       "      <td>Ortega</td>\n",
       "      <td>XXXXXXXXX</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>697 Little Meadow</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.290380</td>\n",
       "      <td>-66.370613</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Nagercoil</td>\n",
       "      <td>India</td>\n",
       "      <td>7547</td>\n",
       "      <td>2016-01-15 18:54:00</td>\n",
       "      <td>26018</td>\n",
       "      <td>1004</td>\n",
       "      <td>16.000000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>65113</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.44</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>383.980011</td>\n",
       "      <td>168.949997</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Tamil Nadu</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1004</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>http://images.acmesports.sports/Field+%26+Stre...</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0</td>\n",
       "      <td>1/19/2016 18:54</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>OrtegaAndrea</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>2016-01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>180519 rows × 59 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Type  Days for shipping (real)  Days for shipment (scheduled)  \\\n",
       "0          DEBIT                         3                              4   \n",
       "1       TRANSFER                         5                              4   \n",
       "2           CASH                         4                              4   \n",
       "3          DEBIT                         3                              4   \n",
       "4        PAYMENT                         2                              4   \n",
       "...          ...                       ...                            ...   \n",
       "180514      CASH                         4                              4   \n",
       "180515     DEBIT                         3                              2   \n",
       "180516  TRANSFER                         5                              4   \n",
       "180517   PAYMENT                         3                              4   \n",
       "180518   PAYMENT                         4                              4   \n",
       "\n",
       "        Benefit per order  Sales per customer   Delivery Status  \\\n",
       "0               91.250000          314.640015  Advance shipping   \n",
       "1             -249.089996          311.359985     Late delivery   \n",
       "2             -247.779999          309.720001  Shipping on time   \n",
       "3               22.860001          304.809998  Advance shipping   \n",
       "4              134.210007          298.250000  Advance shipping   \n",
       "...                   ...                 ...               ...   \n",
       "180514          40.000000          399.980011  Shipping on time   \n",
       "180515        -613.770019          395.980011     Late delivery   \n",
       "180516         141.110001          391.980011     Late delivery   \n",
       "180517         186.229996          387.980011  Advance shipping   \n",
       "180518         168.949997          383.980011  Shipping on time   \n",
       "\n",
       "        Late_delivery_risk  Category Id   Category Name Customer City  \\\n",
       "0                        0           73  Sporting Goods        Caguas   \n",
       "1                        1           73  Sporting Goods        Caguas   \n",
       "2                        0           73  Sporting Goods      San Jose   \n",
       "3                        0           73  Sporting Goods   Los Angeles   \n",
       "4                        0           73  Sporting Goods        Caguas   \n",
       "...                    ...          ...             ...           ...   \n",
       "180514                   0           45         Fishing      Brooklyn   \n",
       "180515                   1           45         Fishing   Bakersfield   \n",
       "180516                   1           45         Fishing       Bristol   \n",
       "180517                   0           45         Fishing        Caguas   \n",
       "180518                   0           45         Fishing        Caguas   \n",
       "\n",
       "       Customer Country Customer Email Customer Fname  Customer Id  \\\n",
       "0           Puerto Rico      XXXXXXXXX          Cally        20755   \n",
       "1           Puerto Rico      XXXXXXXXX          Irene        19492   \n",
       "2               EE. UU.      XXXXXXXXX        Gillian        19491   \n",
       "3               EE. UU.      XXXXXXXXX           Tana        19490   \n",
       "4           Puerto Rico      XXXXXXXXX           Orli        19489   \n",
       "...                 ...            ...            ...          ...   \n",
       "180514          EE. UU.      XXXXXXXXX          Maria         1005   \n",
       "180515          EE. UU.      XXXXXXXXX         Ronald         9141   \n",
       "180516          EE. UU.      XXXXXXXXX           John          291   \n",
       "180517      Puerto Rico      XXXXXXXXX           Mary         2813   \n",
       "180518      Puerto Rico      XXXXXXXXX         Andrea         7547   \n",
       "\n",
       "       Customer Lname Customer Password Customer Segment Customer State  \\\n",
       "0            Holloway         XXXXXXXXX         Consumer             PR   \n",
       "1                Luna         XXXXXXXXX         Consumer             PR   \n",
       "2           Maldonado         XXXXXXXXX         Consumer             CA   \n",
       "3                Tate         XXXXXXXXX      Home Office             CA   \n",
       "4           Hendricks         XXXXXXXXX        Corporate             PR   \n",
       "...               ...               ...              ...            ...   \n",
       "180514       Peterson         XXXXXXXXX      Home Office             NY   \n",
       "180515          Clark         XXXXXXXXX        Corporate             CA   \n",
       "180516          Smith         XXXXXXXXX        Corporate             CT   \n",
       "180517          Smith         XXXXXXXXX         Consumer             PR   \n",
       "180518         Ortega         XXXXXXXXX         Consumer             PR   \n",
       "\n",
       "                   Customer Street  Customer Zipcode  Department Id  \\\n",
       "0         5365 Noble Nectar Island             725.0              2   \n",
       "1                 2679 Rustic Loop             725.0              2   \n",
       "2             8510 Round Bear Gate           95125.0              2   \n",
       "3                  3200 Amber Bend           90027.0              2   \n",
       "4         8671 Iron Anchor Corners             725.0              2   \n",
       "...                            ...               ...            ...   \n",
       "180514            1322 Broad Glade           11207.0              7   \n",
       "180515       7330 Broad Apple Moor           93304.0              7   \n",
       "180516          97 Burning Landing            6010.0              7   \n",
       "180517  2585 Silent Autumn Landing             725.0              7   \n",
       "180518           697 Little Meadow             725.0              7   \n",
       "\n",
       "       Department Name   Latitude   Longitude        Market  Order City  \\\n",
       "0              Fitness  18.251453  -66.037056  Pacific Asia      Bekasi   \n",
       "1              Fitness  18.279451  -66.037064  Pacific Asia     Bikaner   \n",
       "2              Fitness  37.292233 -121.881279  Pacific Asia     Bikaner   \n",
       "3              Fitness  34.125946 -118.291016  Pacific Asia  Townsville   \n",
       "4              Fitness  18.253769  -66.037048  Pacific Asia  Townsville   \n",
       "...                ...        ...         ...           ...         ...   \n",
       "180514        Fan Shop  40.640930  -73.942711  Pacific Asia    Shanghái   \n",
       "180515        Fan Shop  35.362545 -119.018700  Pacific Asia    Hirakata   \n",
       "180516        Fan Shop  41.629959  -72.967155  Pacific Asia    Adelaide   \n",
       "180517        Fan Shop  18.213350  -66.370575  Pacific Asia    Adelaide   \n",
       "180518        Fan Shop  18.290380  -66.370613  Pacific Asia   Nagercoil   \n",
       "\n",
       "       Order Country  Order Customer Id order date (DateOrders)  Order Id  \\\n",
       "0          Indonesia              20755     2018-01-31 22:56:00     77202   \n",
       "1              India              19492     2018-01-13 12:27:00     75939   \n",
       "2              India              19491     2018-01-13 12:06:00     75938   \n",
       "3          Australia              19490     2018-01-13 11:45:00     75937   \n",
       "4          Australia              19489     2018-01-13 11:24:00     75936   \n",
       "...              ...                ...                     ...       ...   \n",
       "180514         China               1005     2016-01-16 03:40:00     26043   \n",
       "180515         Japón               9141     2016-01-16 01:34:00     26037   \n",
       "180516     Australia                291     2016-01-15 21:00:00     26024   \n",
       "180517     Australia               2813     2016-01-15 20:18:00     26022   \n",
       "180518         India               7547     2016-01-15 18:54:00     26018   \n",
       "\n",
       "        Order Item Cardprod Id  Order Item Discount  Order Item Discount Rate  \\\n",
       "0                         1360            13.110000                      0.04   \n",
       "1                         1360            16.389999                      0.05   \n",
       "2                         1360            18.030001                      0.06   \n",
       "3                         1360            22.940001                      0.07   \n",
       "4                         1360            29.500000                      0.09   \n",
       "...                        ...                  ...                       ...   \n",
       "180514                    1004             0.000000                      0.00   \n",
       "180515                    1004             4.000000                      0.01   \n",
       "180516                    1004             8.000000                      0.02   \n",
       "180517                    1004            12.000000                      0.03   \n",
       "180518                    1004            16.000000                      0.04   \n",
       "\n",
       "        Order Item Id  Order Item Product Price  Order Item Profit Ratio  \\\n",
       "0              180517                327.750000                     0.29   \n",
       "1              179254                327.750000                    -0.80   \n",
       "2              179253                327.750000                    -0.80   \n",
       "3              179252                327.750000                     0.08   \n",
       "4              179251                327.750000                     0.45   \n",
       "...               ...                       ...                      ...   \n",
       "180514          65177                399.980011                     0.10   \n",
       "180515          65161                399.980011                    -1.55   \n",
       "180516          65129                399.980011                     0.36   \n",
       "180517          65126                399.980011                     0.48   \n",
       "180518          65113                399.980011                     0.44   \n",
       "\n",
       "        Order Item Quantity       Sales  Order Item Total  \\\n",
       "0                         1  327.750000        314.640015   \n",
       "1                         1  327.750000        311.359985   \n",
       "2                         1  327.750000        309.720001   \n",
       "3                         1  327.750000        304.809998   \n",
       "4                         1  327.750000        298.250000   \n",
       "...                     ...         ...               ...   \n",
       "180514                    1  399.980011        399.980011   \n",
       "180515                    1  399.980011        395.980011   \n",
       "180516                    1  399.980011        391.980011   \n",
       "180517                    1  399.980011        387.980011   \n",
       "180518                    1  399.980011        383.980011   \n",
       "\n",
       "        Order Profit Per Order    Order Region        Order State  \\\n",
       "0                    91.250000  Southeast Asia    Java Occidental   \n",
       "1                  -249.089996      South Asia           Rajastán   \n",
       "2                  -247.779999      South Asia           Rajastán   \n",
       "3                    22.860001         Oceania         Queensland   \n",
       "4                   134.210007         Oceania         Queensland   \n",
       "...                        ...             ...                ...   \n",
       "180514               40.000000    Eastern Asia           Shanghái   \n",
       "180515             -613.770019    Eastern Asia              Osaka   \n",
       "180516              141.110001         Oceania  Australia del Sur   \n",
       "180517              186.229996         Oceania  Australia del Sur   \n",
       "180518              168.949997      South Asia         Tamil Nadu   \n",
       "\n",
       "           Order Status  Order Zipcode  Product Card Id  Product Category Id  \\\n",
       "0              COMPLETE            NaN             1360                   73   \n",
       "1               PENDING            NaN             1360                   73   \n",
       "2                CLOSED            NaN             1360                   73   \n",
       "3              COMPLETE            NaN             1360                   73   \n",
       "4       PENDING_PAYMENT            NaN             1360                   73   \n",
       "...                 ...            ...              ...                  ...   \n",
       "180514           CLOSED            NaN             1004                   45   \n",
       "180515         COMPLETE            NaN             1004                   45   \n",
       "180516          PENDING            NaN             1004                   45   \n",
       "180517  PENDING_PAYMENT            NaN             1004                   45   \n",
       "180518  PENDING_PAYMENT            NaN             1004                   45   \n",
       "\n",
       "        Product Description  \\\n",
       "0                       NaN   \n",
       "1                       NaN   \n",
       "2                       NaN   \n",
       "3                       NaN   \n",
       "4                       NaN   \n",
       "...                     ...   \n",
       "180514                  NaN   \n",
       "180515                  NaN   \n",
       "180516                  NaN   \n",
       "180517                  NaN   \n",
       "180518                  NaN   \n",
       "\n",
       "                                            Product Image  \\\n",
       "0            http://images.acmesports.sports/Smart+watch    \n",
       "1            http://images.acmesports.sports/Smart+watch    \n",
       "2            http://images.acmesports.sports/Smart+watch    \n",
       "3            http://images.acmesports.sports/Smart+watch    \n",
       "4            http://images.acmesports.sports/Smart+watch    \n",
       "...                                                   ...   \n",
       "180514  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180515  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180516  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180517  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "180518  http://images.acmesports.sports/Field+%26+Stre...   \n",
       "\n",
       "                                     Product Name  Product Price  \\\n",
       "0                                    Smart watch      327.750000   \n",
       "1                                    Smart watch      327.750000   \n",
       "2                                    Smart watch      327.750000   \n",
       "3                                    Smart watch      327.750000   \n",
       "4                                    Smart watch      327.750000   \n",
       "...                                           ...            ...   \n",
       "180514  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180515  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180516  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180517  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "180518  Field & Stream Sportsman 16 Gun Fire Safe     399.980011   \n",
       "\n",
       "        Product Status shipping date (DateOrders)   Shipping Mode  \\\n",
       "0                    0             2/3/2018 22:56  Standard Class   \n",
       "1                    0            1/18/2018 12:27  Standard Class   \n",
       "2                    0            1/17/2018 12:06  Standard Class   \n",
       "3                    0            1/16/2018 11:45  Standard Class   \n",
       "4                    0            1/15/2018 11:24  Standard Class   \n",
       "...                ...                        ...             ...   \n",
       "180514               0             1/20/2016 3:40  Standard Class   \n",
       "180515               0             1/19/2016 1:34    Second Class   \n",
       "180516               0            1/20/2016 21:00  Standard Class   \n",
       "180517               0            1/18/2016 20:18  Standard Class   \n",
       "180518               0            1/19/2016 18:54  Standard Class   \n",
       "\n",
       "       Customer Full Name  order_year  order_month  order_weekday  order_hour  \\\n",
       "0           HollowayCally        2018            1              2          22   \n",
       "1               LunaIrene        2018            1              5          12   \n",
       "2        MaldonadoGillian        2018            1              5          12   \n",
       "3                TateTana        2018            1              5          11   \n",
       "4           HendricksOrli        2018            1              5          11   \n",
       "...                   ...         ...          ...            ...         ...   \n",
       "180514      PetersonMaria        2016            1              5           3   \n",
       "180515        ClarkRonald        2016            1              5           1   \n",
       "180516          SmithJohn        2016            1              4          21   \n",
       "180517          SmithMary        2016            1              4          20   \n",
       "180518       OrtegaAndrea        2016            1              4          18   \n",
       "\n",
       "       order_month_year  \n",
       "0               2018-01  \n",
       "1               2018-01  \n",
       "2               2018-01  \n",
       "3               2018-01  \n",
       "4               2018-01  \n",
       "...                 ...  \n",
       "180514          2016-01  \n",
       "180515          2016-01  \n",
       "180516          2016-01  \n",
       "180517          2016-01  \n",
       "180518          2016-01  \n",
       "\n",
       "[180519 rows x 59 columns]"
      ]
     },
     "execution_count": 149,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#添加一列 这里是把订单的年份和合并在一起 作为一个特征 \n",
    "temp=pd.DatetimeIndex(train_data['order date (DateOrders)'])#把这个特征转成时间索引格式 \n",
    "train_data['order_month_year'] = temp.to_period('M')#然后这样处理一下  就成了 年份-月份的格式\n",
    "train_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2数据探索预处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1根据数据制造我们要预测出来的特征 订单欺诈fraud和订单延迟late_delivery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    176457\n",
       "1      4062\n",
       "Name: fraud, dtype: int64"
      ]
     },
     "execution_count": 150,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#下一步要做订单欺诈预测 是个分类问题   先处理一下特征和标签\n",
    "#就是Order Status那个字段 如果是SUSPECTED_FRAUD就是1 反之为0  来表示是否是欺诈订单\n",
    "train_data['fraud']=np.where(train_data['Order Status']=='SUSPECTED_FRAUD',1,0)\n",
    "train_data['fraud'].value_counts()#这样train_data里面就有了这个fraud字段 并且0、1分布都正常显示出来了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4062"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#可以看看train_data里面一共有多少欺诈订单\n",
    "train_data['fraud'].sum() #其实这一步有点多余了  上一步的value_counts可以看得出来 有4062笔订单"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Late delivery        98977\n",
       "Advance shipping     41592\n",
       "Shipping on time     32196\n",
       "Shipping canceled     7754\n",
       "Name: Delivery Status, dtype: int64"
      ]
     },
     "execution_count": 152,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#同样的方法我们可以一下订单是否延迟预测 先对 Delivery Status 这个字段进行探索\n",
    "train_data['Delivery Status'].value_counts()#就是那个Late delivery  如果Delivery Status是这种情况就是延迟  不是就是没有延迟 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    98977\n",
       "0    81542\n",
       "Name: late_delivery, dtype: int64"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#如果Delivery Status是这种情况就是延迟1  不是就是没有延迟 0\n",
    "train_data['late_delivery']=np.where(train_data['Delivery Status']=='Late delivery',1,0)\n",
    "train_data['late_delivery'].value_counts()#这样train_data里面就有了这个 late_delivery 字段 并且0、1分布都正常显示出来了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2对特征进行选择"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 180519 entries, 0 to 180518\n",
      "Data columns (total 61 columns):\n",
      " #   Column                         Non-Null Count   Dtype         \n",
      "---  ------                         --------------   -----         \n",
      " 0   Type                           180519 non-null  object        \n",
      " 1   Days for shipping (real)       180519 non-null  int64         \n",
      " 2   Days for shipment (scheduled)  180519 non-null  int64         \n",
      " 3   Benefit per order              180519 non-null  float64       \n",
      " 4   Sales per customer             180519 non-null  float64       \n",
      " 5   Delivery Status                180519 non-null  object        \n",
      " 6   Late_delivery_risk             180519 non-null  int64         \n",
      " 7   Category Id                    180519 non-null  int64         \n",
      " 8   Category Name                  180519 non-null  object        \n",
      " 9   Customer City                  180519 non-null  object        \n",
      " 10  Customer Country               180519 non-null  object        \n",
      " 11  Customer Email                 180519 non-null  object        \n",
      " 12  Customer Fname                 180519 non-null  object        \n",
      " 13  Customer Id                    180519 non-null  int64         \n",
      " 14  Customer Lname                 180511 non-null  object        \n",
      " 15  Customer Password              180519 non-null  object        \n",
      " 16  Customer Segment               180519 non-null  object        \n",
      " 17  Customer State                 180519 non-null  object        \n",
      " 18  Customer Street                180519 non-null  object        \n",
      " 19  Customer Zipcode               180519 non-null  float64       \n",
      " 20  Department Id                  180519 non-null  int64         \n",
      " 21  Department Name                180519 non-null  object        \n",
      " 22  Latitude                       180519 non-null  float64       \n",
      " 23  Longitude                      180519 non-null  float64       \n",
      " 24  Market                         180519 non-null  object        \n",
      " 25  Order City                     180519 non-null  object        \n",
      " 26  Order Country                  180519 non-null  object        \n",
      " 27  Order Customer Id              180519 non-null  int64         \n",
      " 28  order date (DateOrders)        180519 non-null  datetime64[ns]\n",
      " 29  Order Id                       180519 non-null  int64         \n",
      " 30  Order Item Cardprod Id         180519 non-null  int64         \n",
      " 31  Order Item Discount            180519 non-null  float64       \n",
      " 32  Order Item Discount Rate       180519 non-null  float64       \n",
      " 33  Order Item Id                  180519 non-null  int64         \n",
      " 34  Order Item Product Price       180519 non-null  float64       \n",
      " 35  Order Item Profit Ratio        180519 non-null  float64       \n",
      " 36  Order Item Quantity            180519 non-null  int64         \n",
      " 37  Sales                          180519 non-null  float64       \n",
      " 38  Order Item Total               180519 non-null  float64       \n",
      " 39  Order Profit Per Order         180519 non-null  float64       \n",
      " 40  Order Region                   180519 non-null  object        \n",
      " 41  Order State                    180519 non-null  object        \n",
      " 42  Order Status                   180519 non-null  object        \n",
      " 43  Order Zipcode                  24840 non-null   float64       \n",
      " 44  Product Card Id                180519 non-null  int64         \n",
      " 45  Product Category Id            180519 non-null  int64         \n",
      " 46  Product Description            0 non-null       float64       \n",
      " 47  Product Image                  180519 non-null  object        \n",
      " 48  Product Name                   180519 non-null  object        \n",
      " 49  Product Price                  180519 non-null  float64       \n",
      " 50  Product Status                 180519 non-null  int64         \n",
      " 51  shipping date (DateOrders)     180519 non-null  object        \n",
      " 52  Shipping Mode                  180519 non-null  object        \n",
      " 53  Customer Full Name             180511 non-null  object        \n",
      " 54  order_year                     180519 non-null  int64         \n",
      " 55  order_month                    180519 non-null  int64         \n",
      " 56  order_weekday                  180519 non-null  int64         \n",
      " 57  order_hour                     180519 non-null  int64         \n",
      " 58  order_month_year               180519 non-null  period[M]     \n",
      " 59  fraud                          180519 non-null  int64         \n",
      " 60  late_delivery                  180519 non-null  int64         \n",
      "dtypes: datetime64[ns](1), float64(15), int64(20), object(24), period[M](1)\n",
      "memory usage: 84.0+ MB\n"
     ]
    }
   ],
   "source": [
    "#下面准备开始选择特征进行建模 \n",
    "train_data.info()#这里的特征还挺多  不过object类型是我们不需要的  需要选择一下特征  只选择我们需要的int float类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Type', 'Delivery Status', 'Category Name', 'Customer City',\n",
       "       'Customer Country', 'Customer Email', 'Customer Fname',\n",
       "       'Customer Lname', 'Customer Password', 'Customer Segment',\n",
       "       'Customer State', 'Customer Street', 'Department Name', 'Market',\n",
       "       'Order City', 'Order Country', 'Order Region', 'Order State',\n",
       "       'Order Status', 'Product Image', 'Product Name',\n",
       "       'shipping date (DateOrders)', 'Shipping Mode', 'Customer Full Name'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#选择类别字段\n",
    "categorical_cols=train_data.select_dtypes(include='object').columns\n",
    "categorical_cols#这样就把上面的特征里面的所有的object类型选出来了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3去除类别较少的特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "XXXXXXXXX    180519\n",
       "Name: Customer Email, dtype: int64"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#其实有很多内容在预测的时候 对我们是没有帮助的 比如email这个特征 实际上是对预测结果没有什么作用的 所以要删除的  \n",
    "train_data['Customer Email'].value_counts()#全是XXXXX  留它干嘛呢？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "XXXXXXXXX    180519\n",
       "Name: Customer Password, dtype: int64"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data['Customer Password'].value_counts()#这个也是"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Late_delivery_risk\n",
      "Customer Country\n",
      "Customer Email\n",
      "Customer Password\n",
      "Product Description\n",
      "Product Status\n",
      "fraud\n",
      "late_delivery\n"
     ]
    }
   ],
   "source": [
    "#可以写一个程序 看看类别特别少的 \n",
    "for column in train_data.columns:\n",
    "    if len(train_data[column].value_counts())<3:#如果类别数量小于3的  很有可能是没有什么卵用的特征\n",
    "        print(column)\n",
    "#可以看到 刚才说的 Customer Email和Customer Password都被筛选出来了  另外之前我们自己构造的特征fraud、late_delivery也在里面        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    98977\n",
       "0    81542\n",
       "Name: Late_delivery_risk, dtype: int64"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#再来看看 另外几个是什么情况\n",
    "train_data['Late_delivery_risk'].value_counts()\n",
    "#这个字段就是是否发货延迟 1代表延迟 0代表没有延迟  这个当然跟我们的那个自己造的late_delivery已经重复了 ，所以这个字段晚点要删除 否则标签泄漏"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "EE. UU.        111146\n",
       "Puerto Rico     69373\n",
       "Name: Customer Country, dtype: int64"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data['Customer Country'].value_counts()#这个特征 只有两个 是国家 感觉也没有太多用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Series([], Name: Product Description, dtype: int64)"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data['Product Description'].value_counts()#这个特征是空的 也没有用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    180519\n",
       "Name: Product Status, dtype: int64"
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data['Product Status'].value_counts()#这个特征都等于0 也没有有用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "118\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "http://images.acmesports.sports/Perfect+Fitness+Perfect+Rip+Deck                                 24515\n",
       "http://images.acmesports.sports/Nike+Men%27s+CJ+Elite+2+TD+Football+Cleat                        22246\n",
       "http://images.acmesports.sports/Nike+Men%27s+Dri-FIT+Victory+Golf+Polo                           21035\n",
       "http://images.acmesports.sports/O%27Brien+Men%27s+Neoprene+Life+Vest                             19298\n",
       "http://images.acmesports.sports/Field+%26+Stream+Sportsman+16+Gun+Fire+Safe                      17325\n",
       "                                                                                                 ...  \n",
       "http://images.acmesports.sports/Stiga+Master+Series+ST3100+Competition+Indoor+Table+Tennis...       27\n",
       "http://images.acmesports.sports/SOLE+E35+Elliptical                                                 15\n",
       "http://images.acmesports.sports/Bushnell+Pro+X7+Jolt+Slope+Rangefinder                              11\n",
       "http://images.acmesports.sports/SOLE+E25+Elliptical                                                 10\n",
       "http://images.acmesports.sports/Bowflex+SelectTech+1090+Dumbbells                                   10\n",
       "Name: Product Image, Length: 118, dtype: int64"
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#还有这个 Product Image 也是没有什么卵用的 \n",
    "print(len(train_data['Product Image'].value_counts()))\n",
    "train_data['Product Image'].value_counts()#这个种类太多 而且这里不是做nlp分析 所以也删除掉吧"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Type', 'Days for shipping (real)', 'Days for shipment (scheduled)',\n",
       "       'Benefit per order', 'Sales per customer', 'Delivery Status',\n",
       "       'Late_delivery_risk', 'Category Id', 'Category Name', 'Customer City',\n",
       "       'Customer Country', 'Customer Email', 'Customer Fname', 'Customer Id',\n",
       "       'Customer Lname', 'Customer Password', 'Customer Segment',\n",
       "       'Customer State', 'Customer Street', 'Customer Zipcode',\n",
       "       'Department Id', 'Department Name', 'Latitude', 'Longitude', 'Market',\n",
       "       'Order City', 'Order Country', 'Order Customer Id',\n",
       "       'order date (DateOrders)', 'Order Id', 'Order Item Cardprod Id',\n",
       "       'Order Item Discount', 'Order Item Discount Rate', 'Order Item Id',\n",
       "       'Order Item Product Price', 'Order Item Profit Ratio',\n",
       "       'Order Item Quantity', 'Sales', 'Order Item Total',\n",
       "       'Order Profit Per Order', 'Order Region', 'Order State', 'Order Status',\n",
       "       'Order Zipcode', 'Product Card Id', 'Product Category Id',\n",
       "       'Product Description', 'Product Image', 'Product Name', 'Product Price',\n",
       "       'Product Status', 'shipping date (DateOrders)', 'Shipping Mode',\n",
       "       'Customer Full Name', 'order_year', 'order_month', 'order_weekday',\n",
       "       'order_hour', 'order_month_year', 'fraud', 'late_delivery'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除一些特征之前特征的总数目: 61\n"
     ]
    }
   ],
   "source": [
    "print(\"删除一些特征之前特征的总数目:\",len(train_data.columns))\n",
    "#对预测效果是没有什么卵用的特征就去掉吧   还有哪些和name名字相关的 也删掉吧   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4去除一些跟预测结果无关的特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除几个特征之后特征的总数目: 54\n"
     ]
    }
   ],
   "source": [
    "train_data.drop(['Customer Email','Customer Password','Product Description','Product Status','Product Image',\\\n",
    "                'Customer Fname','Customer Lname'],axis=1,inplace=True)\n",
    "print(\"删除几个特征之后特征的总数目:\",len(train_data.columns))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除几个特征之后特征的总数目: 45\n"
     ]
    }
   ],
   "source": [
    "#还有通过热力图发现相关性特别高的特征 比如 Order Customer Id和Customer Id相关性都是1了 这个时候只留一个就好了\n",
    "#还有 Order Item Cardprod Id和Category Id相关性高达0.99     Order Item Id和Order Id相关性也是1了   Sales和Sales per customer是0.99\n",
    "#Order Item Total和Sales per customer也是1了   Order Profit Per Order和Benefit per order也是1了 \n",
    "#Product Card Id和Category Id相关性也是0.99了    Product Category Id和Category I是1   \n",
    "#Product Card Id、Product Category Id这两个特征和好几个高度相关了  这两个特征一删除 应该好很多   \n",
    "#还有Product Price和Order Item Product Price 也是1了\n",
    "#所以上面高度相关的二者里面删除一个就行了   注意那个sale字段和Sales per customer 一定要删除Sales per customer 因为sale字段是我们要预测的值\n",
    "# Order Customer Id 、Order Item Cardprod Id、Order Item Id、Sales per customer、Order Item Total、Order Profit Per Order、Product Card Id、\n",
    "#Product Category Id、Product Price\n",
    "train_data.drop(['Order Customer Id','Order Item Cardprod Id','Order Item Id','Sales per customer','Order Item Total',\\\n",
    "                'Order Profit Per Order','Product Card Id','Product Category Id','Product Price'],axis=1,inplace=True)\n",
    "print(\"删除几个特征之后特征的总数目:\",len(train_data.columns))#现在剩下45个特征了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Type                                  0\n",
       "Days for shipping (real)              0\n",
       "Days for shipment (scheduled)         0\n",
       "Benefit per order                     0\n",
       "Delivery Status                       0\n",
       "Late_delivery_risk                    0\n",
       "Category Id                           0\n",
       "Category Name                         0\n",
       "Customer City                         0\n",
       "Customer Country                      0\n",
       "Customer Id                           0\n",
       "Customer Segment                      0\n",
       "Customer State                        0\n",
       "Customer Street                       0\n",
       "Customer Zipcode                      0\n",
       "Department Id                         0\n",
       "Department Name                       0\n",
       "Latitude                              0\n",
       "Longitude                             0\n",
       "Market                                0\n",
       "Order City                            0\n",
       "Order Country                         0\n",
       "order date (DateOrders)               0\n",
       "Order Id                              0\n",
       "Order Item Discount                   0\n",
       "Order Item Discount Rate              0\n",
       "Order Item Product Price              0\n",
       "Order Item Profit Ratio               0\n",
       "Order Item Quantity                   0\n",
       "Sales                                 0\n",
       "Order Region                          0\n",
       "Order State                           0\n",
       "Order Status                          0\n",
       "Order Zipcode                    155679\n",
       "Product Name                          0\n",
       "shipping date (DateOrders)            0\n",
       "Shipping Mode                         0\n",
       "Customer Full Name                    8\n",
       "order_year                            0\n",
       "order_month                           0\n",
       "order_weekday                         0\n",
       "order_hour                            0\n",
       "order_month_year                      0\n",
       "fraud                                 0\n",
       "late_delivery                         0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 168,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#然后看看 缺失值的情况\n",
    "train_data.isna().sum()\n",
    "#Order Zipcode这个特征缺失值太多了 删除吧 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Type</th>\n",
       "      <th>Days for shipping (real)</th>\n",
       "      <th>Days for shipment (scheduled)</th>\n",
       "      <th>Benefit per order</th>\n",
       "      <th>Delivery Status</th>\n",
       "      <th>Late_delivery_risk</th>\n",
       "      <th>Category Id</th>\n",
       "      <th>Category Name</th>\n",
       "      <th>Customer City</th>\n",
       "      <th>Customer Country</th>\n",
       "      <th>Customer Id</th>\n",
       "      <th>Customer Segment</th>\n",
       "      <th>Customer State</th>\n",
       "      <th>Customer Street</th>\n",
       "      <th>Customer Zipcode</th>\n",
       "      <th>Department Id</th>\n",
       "      <th>Department Name</th>\n",
       "      <th>Latitude</th>\n",
       "      <th>Longitude</th>\n",
       "      <th>Market</th>\n",
       "      <th>Order City</th>\n",
       "      <th>Order Country</th>\n",
       "      <th>order date (DateOrders)</th>\n",
       "      <th>Order Id</th>\n",
       "      <th>Order Item Discount</th>\n",
       "      <th>Order Item Discount Rate</th>\n",
       "      <th>Order Item Product Price</th>\n",
       "      <th>Order Item Profit Ratio</th>\n",
       "      <th>Order Item Quantity</th>\n",
       "      <th>Sales</th>\n",
       "      <th>Order Region</th>\n",
       "      <th>Order State</th>\n",
       "      <th>Order Status</th>\n",
       "      <th>Order Zipcode</th>\n",
       "      <th>Product Name</th>\n",
       "      <th>shipping date (DateOrders)</th>\n",
       "      <th>Shipping Mode</th>\n",
       "      <th>Customer Full Name</th>\n",
       "      <th>order_year</th>\n",
       "      <th>order_month</th>\n",
       "      <th>order_weekday</th>\n",
       "      <th>order_hour</th>\n",
       "      <th>order_month_year</th>\n",
       "      <th>fraud</th>\n",
       "      <th>late_delivery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>91.250000</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>20755</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>5365 Noble Nectar Island</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.251453</td>\n",
       "      <td>-66.037056</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bekasi</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>2018-01-31 22:56:00</td>\n",
       "      <td>77202</td>\n",
       "      <td>13.110000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.29</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>Southeast Asia</td>\n",
       "      <td>Java Occidental</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>2/3/2018 22:56</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HollowayCally</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>22</td>\n",
       "      <td>2018-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>-249.089996</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>19492</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2679 Rustic Loop</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.279451</td>\n",
       "      <td>-66.037064</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>2018-01-13 12:27:00</td>\n",
       "      <td>75939</td>\n",
       "      <td>16.389999</td>\n",
       "      <td>0.05</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>1/18/2018 12:27</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LunaIrene</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>2018-01</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>-247.779999</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>19491</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>CA</td>\n",
       "      <td>8510 Round Bear Gate</td>\n",
       "      <td>95125.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>37.292233</td>\n",
       "      <td>-121.881279</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>2018-01-13 12:06:00</td>\n",
       "      <td>75938</td>\n",
       "      <td>18.030001</td>\n",
       "      <td>0.06</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>-0.80</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>1/17/2018 12:06</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>MaldonadoGillian</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>12</td>\n",
       "      <td>2018-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>22.860001</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>19490</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>CA</td>\n",
       "      <td>3200 Amber Bend</td>\n",
       "      <td>90027.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>34.125946</td>\n",
       "      <td>-118.291016</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2018-01-13 11:45:00</td>\n",
       "      <td>75937</td>\n",
       "      <td>22.940001</td>\n",
       "      <td>0.07</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.08</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>1/16/2018 11:45</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>TateTana</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "      <td>2018-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>134.210007</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>19489</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>PR</td>\n",
       "      <td>8671 Iron Anchor Corners</td>\n",
       "      <td>725.0</td>\n",
       "      <td>2</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>18.253769</td>\n",
       "      <td>-66.037048</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2018-01-13 11:24:00</td>\n",
       "      <td>75936</td>\n",
       "      <td>29.500000</td>\n",
       "      <td>0.09</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>0.45</td>\n",
       "      <td>1</td>\n",
       "      <td>327.750000</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>1/15/2018 11:24</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HendricksOrli</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "      <td>2018-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180514</th>\n",
       "      <td>CASH</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>40.000000</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>1005</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>NY</td>\n",
       "      <td>1322 Broad Glade</td>\n",
       "      <td>11207.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>40.640930</td>\n",
       "      <td>-73.942711</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>China</td>\n",
       "      <td>2016-01-16 03:40:00</td>\n",
       "      <td>26043</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.00</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.10</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>CLOSED</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>1/20/2016 3:40</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PetersonMaria</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>2016-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180515</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>-613.770019</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bakersfield</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>9141</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CA</td>\n",
       "      <td>7330 Broad Apple Moor</td>\n",
       "      <td>93304.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>35.362545</td>\n",
       "      <td>-119.018700</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Hirakata</td>\n",
       "      <td>Japón</td>\n",
       "      <td>2016-01-16 01:34:00</td>\n",
       "      <td>26037</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>0.01</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>-1.55</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Osaka</td>\n",
       "      <td>COMPLETE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>1/19/2016 1:34</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>ClarkRonald</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>2016-01</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180516</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>141.110001</td>\n",
       "      <td>Late delivery</td>\n",
       "      <td>1</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bristol</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>291</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CT</td>\n",
       "      <td>97 Burning Landing</td>\n",
       "      <td>6010.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>41.629959</td>\n",
       "      <td>-72.967155</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2016-01-15 21:00:00</td>\n",
       "      <td>26024</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>0.02</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.36</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>1/20/2016 21:00</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithJohn</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>21</td>\n",
       "      <td>2016-01</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180517</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>186.229996</td>\n",
       "      <td>Advance shipping</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>2813</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>2585 Silent Autumn Landing</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.213350</td>\n",
       "      <td>-66.370575</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>2016-01-15 20:18:00</td>\n",
       "      <td>26022</td>\n",
       "      <td>12.000000</td>\n",
       "      <td>0.03</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.48</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>1/18/2016 20:18</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithMary</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>20</td>\n",
       "      <td>2016-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180518</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>168.949997</td>\n",
       "      <td>Shipping on time</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>7547</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>697 Little Meadow</td>\n",
       "      <td>725.0</td>\n",
       "      <td>7</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>18.290380</td>\n",
       "      <td>-66.370613</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Nagercoil</td>\n",
       "      <td>India</td>\n",
       "      <td>2016-01-15 18:54:00</td>\n",
       "      <td>26018</td>\n",
       "      <td>16.000000</td>\n",
       "      <td>0.04</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>0.44</td>\n",
       "      <td>1</td>\n",
       "      <td>399.980011</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Tamil Nadu</td>\n",
       "      <td>PENDING_PAYMENT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>1/19/2016 18:54</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>OrtegaAndrea</td>\n",
       "      <td>2016</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>2016-01</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>180519 rows × 45 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Type  Days for shipping (real)  Days for shipment (scheduled)  \\\n",
       "0          DEBIT                         3                              4   \n",
       "1       TRANSFER                         5                              4   \n",
       "2           CASH                         4                              4   \n",
       "3          DEBIT                         3                              4   \n",
       "4        PAYMENT                         2                              4   \n",
       "...          ...                       ...                            ...   \n",
       "180514      CASH                         4                              4   \n",
       "180515     DEBIT                         3                              2   \n",
       "180516  TRANSFER                         5                              4   \n",
       "180517   PAYMENT                         3                              4   \n",
       "180518   PAYMENT                         4                              4   \n",
       "\n",
       "        Benefit per order   Delivery Status  Late_delivery_risk  Category Id  \\\n",
       "0               91.250000  Advance shipping                   0           73   \n",
       "1             -249.089996     Late delivery                   1           73   \n",
       "2             -247.779999  Shipping on time                   0           73   \n",
       "3               22.860001  Advance shipping                   0           73   \n",
       "4              134.210007  Advance shipping                   0           73   \n",
       "...                   ...               ...                 ...          ...   \n",
       "180514          40.000000  Shipping on time                   0           45   \n",
       "180515        -613.770019     Late delivery                   1           45   \n",
       "180516         141.110001     Late delivery                   1           45   \n",
       "180517         186.229996  Advance shipping                   0           45   \n",
       "180518         168.949997  Shipping on time                   0           45   \n",
       "\n",
       "         Category Name Customer City Customer Country  Customer Id  \\\n",
       "0       Sporting Goods        Caguas      Puerto Rico        20755   \n",
       "1       Sporting Goods        Caguas      Puerto Rico        19492   \n",
       "2       Sporting Goods      San Jose          EE. UU.        19491   \n",
       "3       Sporting Goods   Los Angeles          EE. UU.        19490   \n",
       "4       Sporting Goods        Caguas      Puerto Rico        19489   \n",
       "...                ...           ...              ...          ...   \n",
       "180514         Fishing      Brooklyn          EE. UU.         1005   \n",
       "180515         Fishing   Bakersfield          EE. UU.         9141   \n",
       "180516         Fishing       Bristol          EE. UU.          291   \n",
       "180517         Fishing        Caguas      Puerto Rico         2813   \n",
       "180518         Fishing        Caguas      Puerto Rico         7547   \n",
       "\n",
       "       Customer Segment Customer State             Customer Street  \\\n",
       "0              Consumer             PR    5365 Noble Nectar Island   \n",
       "1              Consumer             PR            2679 Rustic Loop   \n",
       "2              Consumer             CA        8510 Round Bear Gate   \n",
       "3           Home Office             CA             3200 Amber Bend   \n",
       "4             Corporate             PR    8671 Iron Anchor Corners   \n",
       "...                 ...            ...                         ...   \n",
       "180514      Home Office             NY            1322 Broad Glade   \n",
       "180515        Corporate             CA       7330 Broad Apple Moor   \n",
       "180516        Corporate             CT          97 Burning Landing   \n",
       "180517         Consumer             PR  2585 Silent Autumn Landing   \n",
       "180518         Consumer             PR           697 Little Meadow   \n",
       "\n",
       "        Customer Zipcode  Department Id Department Name   Latitude  \\\n",
       "0                  725.0              2         Fitness  18.251453   \n",
       "1                  725.0              2         Fitness  18.279451   \n",
       "2                95125.0              2         Fitness  37.292233   \n",
       "3                90027.0              2         Fitness  34.125946   \n",
       "4                  725.0              2         Fitness  18.253769   \n",
       "...                  ...            ...             ...        ...   \n",
       "180514           11207.0              7        Fan Shop  40.640930   \n",
       "180515           93304.0              7        Fan Shop  35.362545   \n",
       "180516            6010.0              7        Fan Shop  41.629959   \n",
       "180517             725.0              7        Fan Shop  18.213350   \n",
       "180518             725.0              7        Fan Shop  18.290380   \n",
       "\n",
       "         Longitude        Market  Order City Order Country  \\\n",
       "0       -66.037056  Pacific Asia      Bekasi     Indonesia   \n",
       "1       -66.037064  Pacific Asia     Bikaner         India   \n",
       "2      -121.881279  Pacific Asia     Bikaner         India   \n",
       "3      -118.291016  Pacific Asia  Townsville     Australia   \n",
       "4       -66.037048  Pacific Asia  Townsville     Australia   \n",
       "...            ...           ...         ...           ...   \n",
       "180514  -73.942711  Pacific Asia    Shanghái         China   \n",
       "180515 -119.018700  Pacific Asia    Hirakata         Japón   \n",
       "180516  -72.967155  Pacific Asia    Adelaide     Australia   \n",
       "180517  -66.370575  Pacific Asia    Adelaide     Australia   \n",
       "180518  -66.370613  Pacific Asia   Nagercoil         India   \n",
       "\n",
       "       order date (DateOrders)  Order Id  Order Item Discount  \\\n",
       "0          2018-01-31 22:56:00     77202            13.110000   \n",
       "1          2018-01-13 12:27:00     75939            16.389999   \n",
       "2          2018-01-13 12:06:00     75938            18.030001   \n",
       "3          2018-01-13 11:45:00     75937            22.940001   \n",
       "4          2018-01-13 11:24:00     75936            29.500000   \n",
       "...                        ...       ...                  ...   \n",
       "180514     2016-01-16 03:40:00     26043             0.000000   \n",
       "180515     2016-01-16 01:34:00     26037             4.000000   \n",
       "180516     2016-01-15 21:00:00     26024             8.000000   \n",
       "180517     2016-01-15 20:18:00     26022            12.000000   \n",
       "180518     2016-01-15 18:54:00     26018            16.000000   \n",
       "\n",
       "        Order Item Discount Rate  Order Item Product Price  \\\n",
       "0                           0.04                327.750000   \n",
       "1                           0.05                327.750000   \n",
       "2                           0.06                327.750000   \n",
       "3                           0.07                327.750000   \n",
       "4                           0.09                327.750000   \n",
       "...                          ...                       ...   \n",
       "180514                      0.00                399.980011   \n",
       "180515                      0.01                399.980011   \n",
       "180516                      0.02                399.980011   \n",
       "180517                      0.03                399.980011   \n",
       "180518                      0.04                399.980011   \n",
       "\n",
       "        Order Item Profit Ratio  Order Item Quantity       Sales  \\\n",
       "0                          0.29                    1  327.750000   \n",
       "1                         -0.80                    1  327.750000   \n",
       "2                         -0.80                    1  327.750000   \n",
       "3                          0.08                    1  327.750000   \n",
       "4                          0.45                    1  327.750000   \n",
       "...                         ...                  ...         ...   \n",
       "180514                     0.10                    1  399.980011   \n",
       "180515                    -1.55                    1  399.980011   \n",
       "180516                     0.36                    1  399.980011   \n",
       "180517                     0.48                    1  399.980011   \n",
       "180518                     0.44                    1  399.980011   \n",
       "\n",
       "          Order Region        Order State     Order Status  Order Zipcode  \\\n",
       "0       Southeast Asia    Java Occidental         COMPLETE            NaN   \n",
       "1           South Asia           Rajastán          PENDING            NaN   \n",
       "2           South Asia           Rajastán           CLOSED            NaN   \n",
       "3              Oceania         Queensland         COMPLETE            NaN   \n",
       "4              Oceania         Queensland  PENDING_PAYMENT            NaN   \n",
       "...                ...                ...              ...            ...   \n",
       "180514    Eastern Asia           Shanghái           CLOSED            NaN   \n",
       "180515    Eastern Asia              Osaka         COMPLETE            NaN   \n",
       "180516         Oceania  Australia del Sur          PENDING            NaN   \n",
       "180517         Oceania  Australia del Sur  PENDING_PAYMENT            NaN   \n",
       "180518      South Asia         Tamil Nadu  PENDING_PAYMENT            NaN   \n",
       "\n",
       "                                     Product Name shipping date (DateOrders)  \\\n",
       "0                                    Smart watch              2/3/2018 22:56   \n",
       "1                                    Smart watch             1/18/2018 12:27   \n",
       "2                                    Smart watch             1/17/2018 12:06   \n",
       "3                                    Smart watch             1/16/2018 11:45   \n",
       "4                                    Smart watch             1/15/2018 11:24   \n",
       "...                                           ...                        ...   \n",
       "180514  Field & Stream Sportsman 16 Gun Fire Safe             1/20/2016 3:40   \n",
       "180515  Field & Stream Sportsman 16 Gun Fire Safe             1/19/2016 1:34   \n",
       "180516  Field & Stream Sportsman 16 Gun Fire Safe            1/20/2016 21:00   \n",
       "180517  Field & Stream Sportsman 16 Gun Fire Safe            1/18/2016 20:18   \n",
       "180518  Field & Stream Sportsman 16 Gun Fire Safe            1/19/2016 18:54   \n",
       "\n",
       "         Shipping Mode Customer Full Name  order_year  order_month  \\\n",
       "0       Standard Class      HollowayCally        2018            1   \n",
       "1       Standard Class          LunaIrene        2018            1   \n",
       "2       Standard Class   MaldonadoGillian        2018            1   \n",
       "3       Standard Class           TateTana        2018            1   \n",
       "4       Standard Class      HendricksOrli        2018            1   \n",
       "...                ...                ...         ...          ...   \n",
       "180514  Standard Class      PetersonMaria        2016            1   \n",
       "180515    Second Class        ClarkRonald        2016            1   \n",
       "180516  Standard Class          SmithJohn        2016            1   \n",
       "180517  Standard Class          SmithMary        2016            1   \n",
       "180518  Standard Class       OrtegaAndrea        2016            1   \n",
       "\n",
       "        order_weekday  order_hour order_month_year  fraud  late_delivery  \n",
       "0                   2          22          2018-01      0              0  \n",
       "1                   5          12          2018-01      0              1  \n",
       "2                   5          12          2018-01      0              0  \n",
       "3                   5          11          2018-01      0              0  \n",
       "4                   5          11          2018-01      0              0  \n",
       "...               ...         ...              ...    ...            ...  \n",
       "180514              5           3          2016-01      0              0  \n",
       "180515              5           1          2016-01      0              1  \n",
       "180516              4          21          2016-01      0              1  \n",
       "180517              4          20          2016-01      0              0  \n",
       "180518              4          18          2016-01      0              0  \n",
       "\n",
       "[180519 rows x 45 columns]"
      ]
     },
     "execution_count": 169,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option('display.max_columns', 10000)\n",
    "train_data#shipping date (DateOrders)因为我们对时间数据已经处理完了 有了year、week、month、day哪些东西了，所以这个地方没有必要了\n",
    "#还有 Latitude、Longitude经纬度这些东西 已经是没有必要的了  Customer Street你住在哪个街道 感觉对结果影响也不是很大 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除几个特征之后特征的总数目: 40\n"
     ]
    }
   ],
   "source": [
    "train_data.drop(['Order Zipcode','shipping date (DateOrders)','Latitude','Longitude','Customer Street'],axis=1,inplace=True)\n",
    "print(\"删除几个特征之后特征的总数目:\",len(train_data.columns))#上面分析的没用的特征删除之后 现在剩下40个特征了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除几个特征之后特征的总数目: 38\n"
     ]
    }
   ],
   "source": [
    "#order date (DateOrders) 、order_month_year这两个时间特征也处理一下\n",
    "train_data.drop(['order date (DateOrders)','order_month_year'],axis=1,inplace=True)\n",
    "print(\"删除几个特征之后特征的总数目:\",len(train_data.columns))#现在剩下38个特征了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5去除导致标签泄漏的特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "删除几个特征之后特征的总数目: 35\n"
     ]
    }
   ],
   "source": [
    "train_data.drop(['Order Status','Delivery Status','Late_delivery_risk'],axis=1,inplace=True)\n",
    "#这个要删除 因为下面要预测订单是否欺诈 最开始就是通过这个标签来算的 这个不删除会导致标签泄漏  同理Delivery Status和Late_delivery_risk\n",
    "print(\"删除几个特征之后特征的总数目:\",len(train_data.columns))#这样就剩下35个特征了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.6处理非数值型的特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Type', 'Category Name', 'Customer City', 'Customer Country',\n",
       "       'Customer Segment', 'Customer State', 'Department Name', 'Market',\n",
       "       'Order City', 'Order Country', 'Order Region', 'Order State',\n",
       "       'Product Name', 'Shipping Mode', 'Customer Full Name'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看分类类型\n",
    "#选择类别字段\n",
    "categorical_cols=train_data.select_dtypes(include='object').columns\n",
    "categorical_cols#这样就把上面的特征里面的所有的object类型选出来了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Type</th>\n",
       "      <th>Category Name</th>\n",
       "      <th>Customer City</th>\n",
       "      <th>Customer Country</th>\n",
       "      <th>Customer Segment</th>\n",
       "      <th>Customer State</th>\n",
       "      <th>Department Name</th>\n",
       "      <th>Market</th>\n",
       "      <th>Order City</th>\n",
       "      <th>Order Country</th>\n",
       "      <th>Order Region</th>\n",
       "      <th>Order State</th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Shipping Mode</th>\n",
       "      <th>Customer Full Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bekasi</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>Southeast Asia</td>\n",
       "      <td>Java Occidental</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HollowayCally</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>LunaIrene</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CASH</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>CA</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Bikaner</td>\n",
       "      <td>India</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Rajastán</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>MaldonadoGillian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>CA</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>TateTana</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>Sporting Goods</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>PR</td>\n",
       "      <td>Fitness</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Townsville</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Queensland</td>\n",
       "      <td>Smart watch</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>HendricksOrli</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180514</th>\n",
       "      <td>CASH</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>Home Office</td>\n",
       "      <td>NY</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>China</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Shanghái</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>PetersonMaria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180515</th>\n",
       "      <td>DEBIT</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bakersfield</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CA</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Hirakata</td>\n",
       "      <td>Japón</td>\n",
       "      <td>Eastern Asia</td>\n",
       "      <td>Osaka</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>Second Class</td>\n",
       "      <td>ClarkRonald</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180516</th>\n",
       "      <td>TRANSFER</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Bristol</td>\n",
       "      <td>EE. UU.</td>\n",
       "      <td>Corporate</td>\n",
       "      <td>CT</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithJohn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180517</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Adelaide</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Oceania</td>\n",
       "      <td>Australia del Sur</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>SmithMary</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180518</th>\n",
       "      <td>PAYMENT</td>\n",
       "      <td>Fishing</td>\n",
       "      <td>Caguas</td>\n",
       "      <td>Puerto Rico</td>\n",
       "      <td>Consumer</td>\n",
       "      <td>PR</td>\n",
       "      <td>Fan Shop</td>\n",
       "      <td>Pacific Asia</td>\n",
       "      <td>Nagercoil</td>\n",
       "      <td>India</td>\n",
       "      <td>South Asia</td>\n",
       "      <td>Tamil Nadu</td>\n",
       "      <td>Field &amp; Stream Sportsman 16 Gun Fire Safe</td>\n",
       "      <td>Standard Class</td>\n",
       "      <td>OrtegaAndrea</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>180519 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Type   Category Name Customer City Customer Country  \\\n",
       "0          DEBIT  Sporting Goods        Caguas      Puerto Rico   \n",
       "1       TRANSFER  Sporting Goods        Caguas      Puerto Rico   \n",
       "2           CASH  Sporting Goods      San Jose          EE. UU.   \n",
       "3          DEBIT  Sporting Goods   Los Angeles          EE. UU.   \n",
       "4        PAYMENT  Sporting Goods        Caguas      Puerto Rico   \n",
       "...          ...             ...           ...              ...   \n",
       "180514      CASH         Fishing      Brooklyn          EE. UU.   \n",
       "180515     DEBIT         Fishing   Bakersfield          EE. UU.   \n",
       "180516  TRANSFER         Fishing       Bristol          EE. UU.   \n",
       "180517   PAYMENT         Fishing        Caguas      Puerto Rico   \n",
       "180518   PAYMENT         Fishing        Caguas      Puerto Rico   \n",
       "\n",
       "       Customer Segment Customer State Department Name        Market  \\\n",
       "0              Consumer             PR         Fitness  Pacific Asia   \n",
       "1              Consumer             PR         Fitness  Pacific Asia   \n",
       "2              Consumer             CA         Fitness  Pacific Asia   \n",
       "3           Home Office             CA         Fitness  Pacific Asia   \n",
       "4             Corporate             PR         Fitness  Pacific Asia   \n",
       "...                 ...            ...             ...           ...   \n",
       "180514      Home Office             NY        Fan Shop  Pacific Asia   \n",
       "180515        Corporate             CA        Fan Shop  Pacific Asia   \n",
       "180516        Corporate             CT        Fan Shop  Pacific Asia   \n",
       "180517         Consumer             PR        Fan Shop  Pacific Asia   \n",
       "180518         Consumer             PR        Fan Shop  Pacific Asia   \n",
       "\n",
       "        Order City Order Country    Order Region        Order State  \\\n",
       "0           Bekasi     Indonesia  Southeast Asia    Java Occidental   \n",
       "1          Bikaner         India      South Asia           Rajastán   \n",
       "2          Bikaner         India      South Asia           Rajastán   \n",
       "3       Townsville     Australia         Oceania         Queensland   \n",
       "4       Townsville     Australia         Oceania         Queensland   \n",
       "...            ...           ...             ...                ...   \n",
       "180514    Shanghái         China    Eastern Asia           Shanghái   \n",
       "180515    Hirakata         Japón    Eastern Asia              Osaka   \n",
       "180516    Adelaide     Australia         Oceania  Australia del Sur   \n",
       "180517    Adelaide     Australia         Oceania  Australia del Sur   \n",
       "180518   Nagercoil         India      South Asia         Tamil Nadu   \n",
       "\n",
       "                                     Product Name   Shipping Mode  \\\n",
       "0                                    Smart watch   Standard Class   \n",
       "1                                    Smart watch   Standard Class   \n",
       "2                                    Smart watch   Standard Class   \n",
       "3                                    Smart watch   Standard Class   \n",
       "4                                    Smart watch   Standard Class   \n",
       "...                                           ...             ...   \n",
       "180514  Field & Stream Sportsman 16 Gun Fire Safe  Standard Class   \n",
       "180515  Field & Stream Sportsman 16 Gun Fire Safe    Second Class   \n",
       "180516  Field & Stream Sportsman 16 Gun Fire Safe  Standard Class   \n",
       "180517  Field & Stream Sportsman 16 Gun Fire Safe  Standard Class   \n",
       "180518  Field & Stream Sportsman 16 Gun Fire Safe  Standard Class   \n",
       "\n",
       "       Customer Full Name  \n",
       "0           HollowayCally  \n",
       "1               LunaIrene  \n",
       "2        MaldonadoGillian  \n",
       "3                TateTana  \n",
       "4           HendricksOrli  \n",
       "...                   ...  \n",
       "180514      PetersonMaria  \n",
       "180515        ClarkRonald  \n",
       "180516          SmithJohn  \n",
       "180517          SmithMary  \n",
       "180518       OrtegaAndrea  \n",
       "\n",
       "[180519 rows x 15 columns]"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data[categorical_cols]#这些分类类型都是字符串类型的  然后不能放在模型里面去预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_data['Customer Full Name'].isnull().sum()#这个名字里面有8个缺失值  把这些数据删除吧"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(180511, 35)\n"
     ]
    }
   ],
   "source": [
    "train_data=train_data.dropna(subset=['Customer Full Name'])\n",
    "print(train_data.shape)#这样整个数据就少了8行数据   数据的总量从180519变成了180511"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \"\"\"\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Type</th>\n",
       "      <th>Category Name</th>\n",
       "      <th>Customer City</th>\n",
       "      <th>Customer Country</th>\n",
       "      <th>Customer Segment</th>\n",
       "      <th>Customer State</th>\n",
       "      <th>Department Name</th>\n",
       "      <th>Market</th>\n",
       "      <th>Order City</th>\n",
       "      <th>Order Country</th>\n",
       "      <th>Order Region</th>\n",
       "      <th>Order State</th>\n",
       "      <th>Product Name</th>\n",
       "      <th>Shipping Mode</th>\n",
       "      <th>Customer Full Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "      <td>66</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>331</td>\n",
       "      <td>70</td>\n",
       "      <td>15</td>\n",
       "      <td>475</td>\n",
       "      <td>78</td>\n",
       "      <td>3</td>\n",
       "      <td>5638</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>40</td>\n",
       "      <td>66</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>391</td>\n",
       "      <td>69</td>\n",
       "      <td>13</td>\n",
       "      <td>841</td>\n",
       "      <td>78</td>\n",
       "      <td>3</td>\n",
       "      <td>7388</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>452</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>391</td>\n",
       "      <td>69</td>\n",
       "      <td>13</td>\n",
       "      <td>841</td>\n",
       "      <td>78</td>\n",
       "      <td>3</td>\n",
       "      <td>7510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "      <td>285</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>3226</td>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>835</td>\n",
       "      <td>78</td>\n",
       "      <td>3</td>\n",
       "      <td>12404</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>40</td>\n",
       "      <td>66</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>36</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>3226</td>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>835</td>\n",
       "      <td>78</td>\n",
       "      <td>3</td>\n",
       "      <td>5318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180514</th>\n",
       "      <td>0</td>\n",
       "      <td>18</td>\n",
       "      <td>59</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>31</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2922</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>913</td>\n",
       "      <td>24</td>\n",
       "      <td>3</td>\n",
       "      <td>9834</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180515</th>\n",
       "      <td>1</td>\n",
       "      <td>18</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>1362</td>\n",
       "      <td>77</td>\n",
       "      <td>7</td>\n",
       "      <td>770</td>\n",
       "      <td>24</td>\n",
       "      <td>2</td>\n",
       "      <td>2275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180516</th>\n",
       "      <td>3</td>\n",
       "      <td>18</td>\n",
       "      <td>55</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>25</td>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>88</td>\n",
       "      <td>24</td>\n",
       "      <td>3</td>\n",
       "      <td>11821</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180517</th>\n",
       "      <td>2</td>\n",
       "      <td>18</td>\n",
       "      <td>66</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>25</td>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>88</td>\n",
       "      <td>24</td>\n",
       "      <td>3</td>\n",
       "      <td>11861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>180518</th>\n",
       "      <td>2</td>\n",
       "      <td>18</td>\n",
       "      <td>66</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2203</td>\n",
       "      <td>69</td>\n",
       "      <td>13</td>\n",
       "      <td>967</td>\n",
       "      <td>24</td>\n",
       "      <td>3</td>\n",
       "      <td>9381</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>180511 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        Type  Category Name  Customer City  Customer Country  \\\n",
       "0          1             40             66                 1   \n",
       "1          3             40             66                 1   \n",
       "2          0             40            452                 0   \n",
       "3          1             40            285                 0   \n",
       "4          2             40             66                 1   \n",
       "...      ...            ...            ...               ...   \n",
       "180514     0             18             59                 0   \n",
       "180515     1             18             26                 0   \n",
       "180516     3             18             55                 0   \n",
       "180517     2             18             66                 1   \n",
       "180518     2             18             66                 1   \n",
       "\n",
       "        Customer Segment  Customer State  Department Name  Market  Order City  \\\n",
       "0                      0              36                4       3         331   \n",
       "1                      0              36                4       3         391   \n",
       "2                      0               5                4       3         391   \n",
       "3                      2               5                4       3        3226   \n",
       "4                      1              36                4       3        3226   \n",
       "...                  ...             ...              ...     ...         ...   \n",
       "180514                 2              31                3       3        2922   \n",
       "180515                 1               5                3       3        1362   \n",
       "180516                 1               7                3       3          25   \n",
       "180517                 0              36                3       3          25   \n",
       "180518                 0              36                3       3        2203   \n",
       "\n",
       "        Order Country  Order Region  Order State  Product Name  Shipping Mode  \\\n",
       "0                  70            15          475            78              3   \n",
       "1                  69            13          841            78              3   \n",
       "2                  69            13          841            78              3   \n",
       "3                   8            11          835            78              3   \n",
       "4                   8            11          835            78              3   \n",
       "...               ...           ...          ...           ...            ...   \n",
       "180514             31             7          913            24              3   \n",
       "180515             77             7          770            24              2   \n",
       "180516              8            11           88            24              3   \n",
       "180517              8            11           88            24              3   \n",
       "180518             69            13          967            24              3   \n",
       "\n",
       "        Customer Full Name  \n",
       "0                     5638  \n",
       "1                     7388  \n",
       "2                     7510  \n",
       "3                    12404  \n",
       "4                     5318  \n",
       "...                    ...  \n",
       "180514                9834  \n",
       "180515                2275  \n",
       "180516               11821  \n",
       "180517               11861  \n",
       "180518                9381  \n",
       "\n",
       "[180511 rows x 15 columns]"
      ]
     },
     "execution_count": 177,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#做一下labelEncoder\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "le=LabelEncoder()\n",
    "for cat in categorical_cols:\n",
    "    train_data[cat]=le.fit_transform(train_data[cat])\n",
    "train_data[categorical_cols]#这样就把所有的object特征给换成了字符型 用labelEncoder的方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Type', 'Days for shipping (real)', 'Days for shipment (scheduled)', 'Benefit per order', 'Category Id', 'Category Name', 'Customer City', 'Customer Country', 'Customer Id', 'Customer Segment', 'Customer State', 'Customer Zipcode', 'Department Id', 'Department Name', 'Market', 'Order City', 'Order Country', 'Order Id', 'Order Item Discount', 'Order Item Discount Rate', 'Order Item Product Price', 'Order Item Profit Ratio', 'Order Item Quantity', 'Sales', 'Order Region', 'Order State', 'Product Name', 'Shipping Mode', 'Customer Full Name', 'order_year', 'order_month', 'order_weekday', 'order_hour', 'fraud', 'late_delivery'] \n",
      "----------------\n",
      "Index(['Type', 'Category Name', 'Customer City', 'Customer Country',\n",
      "       'Customer Segment', 'Customer State', 'Department Name', 'Market',\n",
      "       'Order City', 'Order Country', 'Order Region', 'Order State',\n",
      "       'Product Name', 'Shipping Mode', 'Customer Full Name'],\n",
      "      dtype='object')\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['Days for shipping (real)',\n",
       " 'Days for shipment (scheduled)',\n",
       " 'Benefit per order',\n",
       " 'Category Id',\n",
       " 'Customer Id',\n",
       " 'Customer Zipcode',\n",
       " 'Department Id',\n",
       " 'Order Id',\n",
       " 'Order Item Discount',\n",
       " 'Order Item Discount Rate',\n",
       " 'Order Item Product Price',\n",
       " 'Order Item Profit Ratio',\n",
       " 'Order Item Quantity',\n",
       " 'Sales',\n",
       " 'order_year',\n",
       " 'order_month',\n",
       " 'order_weekday',\n",
       " 'order_hour',\n",
       " 'fraud',\n",
       " 'late_delivery']"
      ]
     },
     "execution_count": 178,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#得到所有的数值类型\n",
    "numerical_columns=train_data.columns.tolist()\n",
    "print(numerical_columns,'\\n----------------')\n",
    "print(categorical_cols)\n",
    "for x in categorical_cols.tolist():\n",
    "    numerical_columns.remove(x)#numerical_columns是train_data里面所有columns的列索引  categorical_cols是刚才提出来的类别的索引\n",
    "    #这样做的目的是把得到 非类别型的索引\n",
    "numerical_columns#这样就拿到了 所有的数值类型的索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3fraud欺诈预测 准备特征和标签 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(180511, 34)\n",
      "(180511,)\n"
     ]
    }
   ],
   "source": [
    "#fraud欺诈是我们要预测的一个目标  先来做一下这个的预测\n",
    "x_fraud=train_data.loc[:,train_data.columns!='fraud']#拿到x_fruad除了fruad之外的特征\n",
    "y_fraud=train_data['fraud']\n",
    "print(x_fraud.shape)\n",
    "print(y_fraud.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4数据切分和模型预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(144408, 34)\n",
      "(36103, 34)\n",
      "(144408,)\n",
      "(36103,)\n"
     ]
    }
   ],
   "source": [
    "#再去做训练之前要做一下数据集的切分\n",
    "from sklearn.model_selection import train_test_split\n",
    "#数据集切分\n",
    "x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test=train_test_split(x_fraud,y_fraud,test_size=0.2)\n",
    "print(x_fraud_train.shape)\n",
    "print(x_fraud_test.shape)\n",
    "print(y_fraud_train.shape)\n",
    "print(y_fraud_test.shape)#已经切分好了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {},
   "outputs": [],
   "source": [
    "#做一下数据的规范化处理\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "sc=StandardScaler()\n",
    "x_fraud_train=sc.fit_transform(x_fraud_train)\n",
    "x_fraud_test=sc.transform(x_fraud_test)#这样就把数据的规范化给做好了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.32616536,  0.92608842, -0.6771065 , ...,  0.4937184 ,\n",
       "        -1.36966459,  0.90808781],\n",
       "       [-0.67976948, -1.53746432, -2.1318353 , ..., -1.50465274,\n",
       "         0.6546569 ,  0.90808781],\n",
       "       [ 0.32616536,  0.31020023,  0.77762229, ..., -1.50465274,\n",
       "        -1.2250702 , -1.10121509],\n",
       "       ...,\n",
       "       [ 0.32616536,  0.92608842, -0.6771065 , ..., -0.50546717,\n",
       "         0.07627933,  0.90808781],\n",
       "       [-0.67976948,  0.92608842,  0.77762229, ..., -0.50546717,\n",
       "         1.23303447,  0.90808781],\n",
       "       [-0.67976948, -0.92157613, -1.4044709 , ..., -0.50546717,\n",
       "        -1.65885338,  0.90808781]])"
      ]
     },
     "execution_count": 182,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_fraud_train#数据已经标准化了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.32616536, -0.30568795,  0.77762229, ...,  0.4937184 ,\n",
       "         1.23303447, -1.10121509],\n",
       "       [ 1.3321002 , -0.30568795,  0.77762229, ..., -1.50465274,\n",
       "         0.36546811, -1.10121509],\n",
       "       [-0.67976948, -0.92157613, -1.4044709 , ..., -0.00587439,\n",
       "         1.52222325,  0.90808781],\n",
       "       ...,\n",
       "       [ 1.3321002 ,  1.5419766 ,  0.77762229, ..., -0.00587439,\n",
       "         1.37762886,  0.90808781],\n",
       "       [ 0.32616536,  0.31020023,  0.77762229, ..., -0.50546717,\n",
       "         0.51006251, -1.10121509],\n",
       "       [ 0.32616536,  1.5419766 ,  0.77762229, ...,  0.99331118,\n",
       "        -0.64669263,  0.90808781]])"
      ]
     },
     "execution_count": 183,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_fraud_test#这个也是"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 204,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score ,recall_score,roc_auc_score,confusion_matrix,f1_score\n",
    "\n",
    "accuracy_list={}\n",
    "recall_list={}\n",
    "auc_list={}\n",
    "f1_list={}\n",
    "\n",
    "def model_stats(model,x_train,x_test,y_train,y_test,name='Fraud'):\n",
    "    #写一个函数 输入模型 训练集 测试集的特征和标签  已经要预测的字段的名称  \n",
    "    #然后返回预测的效果  包括预测的指标的名称  使用的模型 准确率 召回率 Auc的值 还有F1-score 以及混淆矩阵    \n",
    "    model=model.fit(x_train,y_train)#模型训练一下数据集\n",
    "    y_pred=model.predict(x_test)#预测y的值\n",
    "    accuracy=accuracy_score(y_pred,y_test)#算一下准确率\n",
    "    recall=recall_score(y_pred,y_test)#这个是召回率\n",
    "    auc=roc_auc_score(y_pred,y_test)   #这个是auc指标\n",
    "    f1=f1_score(y_pred,y_test)   #这个是F1-score\n",
    "    confusion=confusion_matrix(y_pred,y_test)#这个是稀疏矩阵\n",
    "    \n",
    "    accuracy_list[name,model]=accuracy\n",
    "    recall_list[name,model]=recall\n",
    "    auc_list[name,model]=auc\n",
    "    f1_list[name,model]=f1\n",
    "    \n",
    "    print(\"预测:{}\".format(name),'-'*80)\n",
    "    print(\"使用的模型：{}\".format(model))\n",
    "    print(\"准确率：{}\".format(accuracy))\n",
    "    print(\"召回率：{}\".format(recall))\n",
    "    print(\"Auc：{}\".format(auc))\n",
    "    print(\"F1-score：{}\".format(f1))\n",
    "    print(\"混淆矩阵：\\n{}\".format(confusion))\n",
    "    print(\"预测:{}\".format(name),'-'*80)    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 205,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "使用的模型：LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
      "                   intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
      "                   multi_class='auto', n_jobs=None, penalty='l2',\n",
      "                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
      "                   warm_start=False)\n",
      "准确率：0.9782289560424341\n",
      "召回率：0.5544554455445545\n",
      "Auc：0.7681355440013277\n",
      "F1-score：0.29946524064171126\n",
      "混淆矩阵：\n",
      "[[35149   651]\n",
      " [  135   168]]\n",
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#先用下逻辑回归的模型\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "model_fraud_LR=LogisticRegression()\n",
    "model_stats(model_fraud_LR,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'预测是否欺诈订单fraud')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5late_delivery订单延期预测（类似仿照步骤3-4）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(180511, 34)\n",
      "(180511,)\n",
      "(144408, 34)\n",
      "(36103, 34)\n",
      "(144408,)\n",
      "(36103,)\n"
     ]
    }
   ],
   "source": [
    "x_late=train_data.loc[:,train_data.columns!='late_delivery']\n",
    "y_late=train_data['late_delivery']\n",
    "print(x_late.shape)\n",
    "print(y_late.shape)\n",
    "x_late_train,x_late_test,y_late_train,y_late_test=train_test_split(x_late,y_late,test_size=0.2)\n",
    "print(x_late_train.shape)\n",
    "print(x_late_test.shape)\n",
    "print(y_late_train.shape)\n",
    "print(y_late_test.shape)#已经切分好了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 187,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "17.468979873688138\n",
      "22.073616717279958\n"
     ]
    }
   ],
   "source": [
    "#数据规范化\n",
    "sd=StandardScaler()\n",
    "x_late_train=sd.fit_transform(x_late_train)\n",
    "x_late_test=sd.transform(x_late_test)#这样就把数据的规范化给做好了\n",
    "print(np.max(x_late_train))\n",
    "print(np.max(x_late_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 206,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n",
      "使用的模型：LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
      "                   intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
      "                   multi_class='auto', n_jobs=None, penalty='l2',\n",
      "                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
      "                   warm_start=False)\n",
      "准确率：0.988809794199928\n",
      "召回率：0.9800345935260687\n",
      "Auc：0.9900172967630343\n",
      "F1-score：0.9899166375480458\n",
      "混淆矩阵：\n",
      "[[15868     0]\n",
      " [  404 19831]]\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "model_late_LR=LogisticRegression()\n",
    "model_stats(model_late_LR,x_late_train,x_late_test,y_late_train,y_late_test,'预测是否延迟订单late')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6其他模型进行预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 189,
   "metadata": {},
   "outputs": [],
   "source": [
    "#除了逻辑回归之外 还可以试试其他的模型\n",
    "from sklearn.naive_bayes import GaussianNB,BernoulliNB\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor\n",
    "from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor,GradientBoostingClassifier,GradientBoostingRegressor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.1高斯朴素贝叶斯"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 207,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "使用的模型：GaussianNB(priors=None, var_smoothing=1e-09)\n",
      "准确率：0.880591640583885\n",
      "召回率：0.15964912280701754\n",
      "Auc：0.5798245614035088\n",
      "F1-score：0.27534039334341903\n",
      "混淆矩阵：\n",
      "[[30973     0]\n",
      " [ 4311   819]]\n",
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n",
      "使用的模型：GaussianNB(priors=None, var_smoothing=1e-09)\n",
      "准确率：0.5718915325596211\n",
      "召回率：0.5619916683197779\n",
      "Auc：0.7809958341598889\n",
      "F1-score：0.7195834391668784\n",
      "混淆矩阵：\n",
      "[[  816     0]\n",
      " [15456 19831]]\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "model_fruad_GS=GaussianNB()\n",
    "model_late_GS=GaussianNB()\n",
    "model_stats(model_fruad_GS,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'预测是否欺诈订单fraud')\n",
    "model_stats(model_late_GS,x_late_train,x_late_test,y_late_train,y_late_test,'预测是否延迟订单late')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.2伯努利朴素贝叶斯"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 193,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.\n",
      "  _warn_prf(average, modifier, msg_start, len(result))\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "使用的模型：BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)\n",
      "准确率：0.9773149045785668\n",
      "召回率：0.0\n",
      "F1-score：0.0\n",
      "混淆矩阵：\n",
      "[[35284   819]\n",
      " [    0     0]]\n",
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n",
      "使用的模型：BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)\n",
      "准确率：0.7001357227931196\n",
      "召回率：0.8184454346134804\n",
      "F1-score：0.6813070356196643\n",
      "混淆矩阵：\n",
      "[[13705  8259]\n",
      " [ 2567 11572]]\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "model_fruad_BNL=BernoulliNB()\n",
    "model_late_BNL=BernoulliNB()\n",
    "model_stats(model_fruad_BNL,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'预测是否欺诈订单fraud')\n",
    "model_stats(model_late_BNL,x_late_train,x_late_test,y_late_train,y_late_test,'预测是否延迟订单late')\n",
    "#从结果来看  这个结果不是很好"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.3 SVM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 208,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "使用的模型：LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
      "          intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
      "          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
      "          verbose=0)\n",
      "准确率：0.9776749854582721\n",
      "召回率：0.5254901960784314\n",
      "Auc：0.7531908690724671\n",
      "F1-score：0.2495344506517691\n",
      "混淆矩阵：\n",
      "[[35163   685]\n",
      " [  121   134]]\n",
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n",
      "使用的模型：LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
      "          intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
      "          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
      "          verbose=0)\n",
      "准确率：0.988809794199928\n",
      "召回率：0.9800345935260687\n",
      "Auc：0.9900172967630343\n",
      "F1-score：0.9899166375480458\n",
      "混淆矩阵：\n",
      "[[15868     0]\n",
      " [  404 19831]]\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from sklearn import svm\n",
    "model_fruad_SVC=svm.LinearSVC()\n",
    "model_late_SVC=svm.LinearSVC()\n",
    "model_stats(model_fruad_SVC,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'预测是否欺诈订单fraud')\n",
    "model_stats(model_late_SVC,x_late_train,x_late_test,y_late_train,y_late_test,'预测是否延迟订单late')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('预测是否欺诈订单fraud',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.9782289560424341,\n",
       " ('预测是否延迟订单late',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.988809794199928,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.880591640583885,\n",
       " ('预测是否延迟订单late',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.5718915325596211,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.9776749854582721,\n",
       " ('预测是否延迟订单late',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.988809794199928}"
      ]
     },
     "execution_count": 210,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#accuracy_list={} recall_list={} auc_list={} f1_list={}\n",
    "accuracy_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.4决策树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 211,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "使用的模型：DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
      "                       max_depth=None, max_features=None, max_leaf_nodes=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, presort='deprecated',\n",
      "                       random_state=None, splitter='best')\n",
      "准确率：0.990776389773703\n",
      "召回率：0.7892857142857143\n",
      "Auc：0.892430906940095\n",
      "F1-score：0.7992766726943942\n",
      "混淆矩阵：\n",
      "[[35107   156]\n",
      " [  177   663]]\n",
      "预测:预测是否欺诈订单fraud --------------------------------------------------------------------------------\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n",
      "使用的模型：DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
      "                       max_depth=None, max_features=None, max_leaf_nodes=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, presort='deprecated',\n",
      "                       random_state=None, splitter='best')\n",
      "准确率：0.9941556103370911\n",
      "召回率：0.9949545913218971\n",
      "Auc：0.9940688328469708\n",
      "F1-score：0.9946785705278556\n",
      "混淆矩阵：\n",
      "[[16172   111]\n",
      " [  100 19720]]\n",
      "预测:预测是否延迟订单late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#试用一下树模型\n",
    "model_fruad_DT = DecisionTreeClassifier()\n",
    "model_late_DT= DecisionTreeClassifier()\n",
    "model_stats(model_fruad_DT,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'预测是否欺诈订单fraud')\n",
    "model_stats(model_late_DT,x_late_train,x_late_test,y_late_train,y_late_test,'预测是否延迟订单late')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 222,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[22 12 13  5 29 20 26  7  4 23 19 14  9 18  3  2 21 10 24 30 31 11 32 16\n",
      " 25  6 15 27  0 17 28 33  8  1]\n",
      "[0.06026826 0.11754478 0.01278335 0.01222896 0.00552459 0.00246441\n",
      " 0.04827792 0.00455    0.08559157 0.01065351 0.01901539 0.03458386\n",
      " 0.00116472 0.00205014 0.00791084 0.05205559 0.04439777 0.06447735\n",
      " 0.01170716 0.00591727 0.00298304 0.01410816 0.00079868 0.00555837\n",
      " 0.02480929 0.04511529 0.00330641 0.05717502 0.06651301 0.00282474\n",
      " 0.02654672 0.03311651 0.0356809  0.07829642]\n"
     ]
    }
   ],
   "source": [
    "#可以通过树模型  看看特征的筛选\n",
    "important_col=model_fruad_DT.feature_importances_.argsort()\n",
    "print(important_col)\n",
    "print(model_fruad_DT.feature_importances_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 223,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>特征</th>\n",
       "      <th>重要程度</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Order Item Quantity</td>\n",
       "      <td>0.000799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Department Id</td>\n",
       "      <td>0.001165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Department Name</td>\n",
       "      <td>0.002050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Category Name</td>\n",
       "      <td>0.002464</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>order_year</td>\n",
       "      <td>0.002825</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Order Item Product Price</td>\n",
       "      <td>0.002983</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Product Name</td>\n",
       "      <td>0.003306</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Customer Country</td>\n",
       "      <td>0.004550</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Category Id</td>\n",
       "      <td>0.005525</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Sales</td>\n",
       "      <td>0.005558</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Order Item Discount Rate</td>\n",
       "      <td>0.005917</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Market</td>\n",
       "      <td>0.007911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Customer Segment</td>\n",
       "      <td>0.010654</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Order Item Discount</td>\n",
       "      <td>0.011707</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Benefit per order</td>\n",
       "      <td>0.012229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Days for shipment (scheduled)</td>\n",
       "      <td>0.012783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Order Item Profit Ratio</td>\n",
       "      <td>0.014108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Customer State</td>\n",
       "      <td>0.019015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Order Region</td>\n",
       "      <td>0.024809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>order_month</td>\n",
       "      <td>0.026547</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>order_weekday</td>\n",
       "      <td>0.033117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Customer Zipcode</td>\n",
       "      <td>0.034584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>order_hour</td>\n",
       "      <td>0.035681</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Order Country</td>\n",
       "      <td>0.044398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Order State</td>\n",
       "      <td>0.045115</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Customer City</td>\n",
       "      <td>0.048278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Order City</td>\n",
       "      <td>0.052056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Shipping Mode</td>\n",
       "      <td>0.057175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Type</td>\n",
       "      <td>0.060268</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Order Id</td>\n",
       "      <td>0.064477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Customer Full Name</td>\n",
       "      <td>0.066513</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>late_delivery</td>\n",
       "      <td>0.078296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>Customer Id</td>\n",
       "      <td>0.085592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Days for shipping (real)</td>\n",
       "      <td>0.117545</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               特征      重要程度\n",
       "0             Order Item Quantity  0.000799\n",
       "1                   Department Id  0.001165\n",
       "2                 Department Name  0.002050\n",
       "3                   Category Name  0.002464\n",
       "4                      order_year  0.002825\n",
       "5        Order Item Product Price  0.002983\n",
       "6                    Product Name  0.003306\n",
       "7                Customer Country  0.004550\n",
       "8                     Category Id  0.005525\n",
       "9                           Sales  0.005558\n",
       "10       Order Item Discount Rate  0.005917\n",
       "11                         Market  0.007911\n",
       "12               Customer Segment  0.010654\n",
       "13            Order Item Discount  0.011707\n",
       "14              Benefit per order  0.012229\n",
       "15  Days for shipment (scheduled)  0.012783\n",
       "16        Order Item Profit Ratio  0.014108\n",
       "17                 Customer State  0.019015\n",
       "18                   Order Region  0.024809\n",
       "19                    order_month  0.026547\n",
       "20                  order_weekday  0.033117\n",
       "21               Customer Zipcode  0.034584\n",
       "22                     order_hour  0.035681\n",
       "23                  Order Country  0.044398\n",
       "24                    Order State  0.045115\n",
       "25                  Customer City  0.048278\n",
       "26                     Order City  0.052056\n",
       "27                  Shipping Mode  0.057175\n",
       "28                           Type  0.060268\n",
       "29                       Order Id  0.064477\n",
       "30             Customer Full Name  0.066513\n",
       "31                  late_delivery  0.078296\n",
       "32                    Customer Id  0.085592\n",
       "33       Days for shipping (real)  0.117545"
      ]
     },
     "execution_count": 223,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "feat_importance=pd.DataFrame({\"特征\":x_fraud.columns[important_col],'重要程度':model_fruad_DT.feature_importances_[important_col]})\n",
    "feat_importance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 229,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAApkAAAH1CAYAAAC9ceXuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAABZKklEQVR4nO3de7xtc73/8ffbnU0utckm4aSiootbIpcupFMipVI60dGVSp1uxymqU1JHN5WUhKSjJAkVNiKKTSR1ShcqJ9oi+4eOis/vj++Ye829rLX2GmN8xppzzf16Ph77sdZce47v+s6x5hjjM7/j+/l+HBECAAAAMi036A4AAABg9BBkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAg3QqD7kBbe+yxR3znO98ZdDcAAACWRZ7sP2b9SObtt98+6C4AAABgnFkfZAIAAGD4EGQCAAAgHUEmAAAA0hFkAgAAIF1nQabtE2xfbvvwKZ6znu1L+x5vZPti2/NtH2970owlAAAADK9Ogkzb+0haPiJ2kDTP9mYTPGdtSSdJmtP349dIel1E7CbpEZKe0EX/AAAA0K2uRjJ3kXR69f18STtO8Jz7Je0naVHvBxHx7xHx8+rhQyWxPhEAAMAs1FWQOUfSLdX3iyStN/4JEbEoIu6aaGPb+0m6ISL+d5L/P9j2AtsLFi5cmNVnAAAAJOkqyLxb0qrV96vX+T22N5X0Nklvnuw5EXF8RGwdEVvPnTu3TT8BAADQga6CzKs1dot8K0k3TWejap7maZIOnGyUEwAAAMOvqyDzm5JeYfsYSS+WdIPtD0xju3dK2kjSp6os85076h8AAAA65IjopuEyKvksSd+PiFs7+SWStt5661iwYEFXzQMAAGByky43uUJXvzEi7tRYhjkAAACWIZ0FmTNt4We/3Gi7ua97eXJPAAAAQFlJAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAus6CTNsn2L7c9uFTPGc925f2PV7R9rer7Q7sqm8AAADoVidBpu19JC0fETtImmd7swmes7akkyTN6fvxIZIWVNv9s+01uugfAAAAutXVSOYukk6vvp8vaccJnnO/pP0kLZpku8slbd1N9wAAANClroLMOZJuqb5fJGm98U+IiEURcVfd7STJ9sG2F9hesHDhwqQuAwAAIEtXQebdklatvl+9xu+Z1nYRcXxEbB0RW8+dO7dVRwEAAJCvqyDzao3dIt9K0k0dbwcAAIAhskJH7X5T0qW250l6jqSX2P5AREyaaV45SdK5tneStIWkH3XUPwAAAHSok5HMiFikksTzQ0m7RsR1kwWYEbFL3/c3S3qWpB9IemZE3N9F/wAAANCtrkYyFRF3aixTvM52/9tkOwAAAAwPKv4AAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0q0w6A4Mk4XHHdt427mvfWNiTwAAAGY3RjIBAACQjiATAAAA6QgyAQAAkI4gEwAAAOkIMgEAAJCOIBMAAADpCDIBAACQjiATAAAA6QgyAQAAkI4gEwAAAOkIMgEAAJCusyDT9gm2L7d9+HSfY3tt2+favtT2cV31DQAAAN3qJMi0vY+k5SNiB0nzbG82zee8QtKXI2InSWvY3rqL/gEAAKBbXY1k7iLp9Or7+ZJ2nOZz/izpMbbXkvQISb+bqHHbB9teYHvBwoUL83oNAACAFF0FmXMk3VJ9v0jSetN8zmWSNpN0qKT/kXTnRI1HxPERsXVEbD137tzMfgMAACBBV0Hm3ZJWrb5ffZLfM9FzPijptRHxPpUg81Ud9Q8AAAAd6irIvFpjt8i3knTTNJ+zmqQn2F5e0naSoqP+AQAAoEMrdNTuNyVdanuepOdIeontD0TE4VM8Z3tJv5J0oqRHSrpC0mkd9Q8AAAAd6iTIjIhFtneR9CxJR0fErZKuW8pz7pJ0paTHddGnmfTHz/x7423Xf/1/JvYEAABgMLoayVRE3Kmx7PHGzwEAAMDsQ8UfAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAOoJMAAAApCPIBAAAQDqCTAAAAKQjyAQAAEA6gkwAAACkaxVk2t41qyMAAAAYHVMGmbaXt/0N2yvaPqv6Wf827+u0dwAAAJiVpgwyI+J+SatI+g9Jm9k+TNIptl9oexVJf5yBPgIAAGCWmc7t8gckXSLpz5L+SdJqkjaXdKqk+d11DQAAALPVpEFmdYv8u5IeiIgLJd0u6RZJIelkSTtKWjAjvQQAAMCsssJk/xERf7d9qKSP2j5R0pMlrSHpPknHS9pf0t4i0OzMbz/1gsbbbnLIN9P6AQAAUNfS5mT+QmXk8khJv5B0kEpg+vyIuEDSxl13EAAAALPPdOZkzlFJ/rlU0l8l/UdE/K36v7u66hgAAABmr0lvl/f5raS3qASk/yXpIbbXkPQHScd12DcAAADMUksNMiPi1RP93Pbm4nY5AAAAJrC0xdjXs/1Q2+tWj3eqvq4cET+XtNMM9BEAAACzzNJGMn8u6TpJj7O9iaQjbT9b0rclPUvS9h33Dwl+fNzzGm/7pNeendgTAACwrFha4s91EbGrpB9Uj1eRtIOkNWw/XSUpCAAAAFjC0oLMGPd1TUm7Slqn+rpmR/0CAADALLa0IHMT2++R9Njq8R8lvU/S/0bEkSoZ5gAAAMASljYn85Uqa2NeJunvkr4oaVtJb6z+/4HuugYAAIDZaqra5btK2i4irpK0lUqt8g0k7SXpBkmKiGfPRCcBAAAwu0w1knmDpC9XyxZtqjL/8u+SfiXpLNsrSlo1InbpvJcAAACYVSYNMiPiT7Z/FRHPt/1mSTdKWlelfvmJEXHaDPURAAAAs8zS5mQeUn09R9JdVeB5lqRX2l4+Iu7vtnsAAACYjaaak7mcpNUkKSJujIg/Vf/1+oj4WETcb/vhU2x/gu3LbR9e9zm2P2O7+QriAAAAGKipljCypH+XJNuftX2s7a0kPb362UMknTLhhvY+kpaPiB0kzbO92XSfU80BfXhEUGoGAABglppqTub9tv9ePdxK0kckbayxhdmfI+nrk2y+i6TTq+/nq2Sm37i059i+SdLnJZ1re6+IOGuixm0fLOlgSdpoo40mewlI9v3PP7fxtk//13MSewIAAIbd0hZj39T26yWtLWnr6meb2/68pH00FiSON0fSLdX3iyStN83nHCDpZ5KOlrSt7UMm2E4RcXxEbB0RW8+dO3cpLwEAAAAzbWlB5j2Sfq2yIPvvq+f/VGUE8wkRceck290tadXq+9Un+T0TPedJko6PiFslfVmldCUAAABmmaUFmctLWrl63sqS3i9pLZUA8R+2d5lku6tVbpFL5Vb7TdN8zq9U1uSUysjpzUvpHwAAAIbQ0pYwWkXSI1QCzHUl/aekT0j6nKS3SNpP0sUTbPdNSZfanqcyd/Mltj8QEYdP8ZztVcpUftH2SyStKGnfRq8KAAAAA7W0IPM3EfFp26+QdL3KbfOrJZ2pkqxz2EQbRcSiapTzWZKOrm5/X7eU59xV/deLGr0SAAAADI1Jg0zby0taqXp4ucp8yVMkOSKOr56zqm1HRIzfvpqvOVli0LSfAwAAgNlnqpHMkHSMbUs6LyLOlyTbV/U95z0TBZgAAABYtk21TuYDtq+oHr7F9oUR8YCkI2yvLOl1KnMpvz8D/cQIOfeEPRtvu+dB5yb2BAAAdGVp2eVfq0Yq75f0FdsLVRZVf7pKgHlfx/0DAADALLS0xJ+FVflIqQSXD5V0tkoW+ANddgwAAACz19KCzF+rlG/cUGU08+GStpU0T9IGYiQTAAAAE1ja7fJbJT2set6KKouzr6CxBdqXtj0AAACWQUsbyfyzpN9JepykH6sEmpdLukHSYyX9vdPeAQAAYFZa2kjkXyWtU33/WJW1Mt+gsRFNAAAA4EGWFijeqLIg+ysl7ddbE9P2ipLWl7R/t90DAADAbDRlkBkRv64WY/9o/6LrEfF3SSfZ/lnXHQQAAMDss9Rb3lVweeEk/3fVRD8HAADAso3scAAAAKQjyAQAAEA6gkwAAACkYxkizFqnn7hH421f/KrvJPYEAACMx0gmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANJR8QfLvC+e9OzG2x74yu8l9gQAgNHBSCYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSsU4mkORTp+7eeNtD9v9uYk8AABg8RjIBAACQjpFMYMh84L+bjYgevh+joQCA4cFIJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACBdZ0Gm7RNsX2778LrPsb2e7R931TcAAAB0q5Mg0/Y+kpaPiB0kzbO9Wc3nfFTSql30DQAAAN3raiRzF0mnV9/Pl7TjdJ9jezdJ90i6taO+AQAAoGNdBZlzJN1Sfb9I0nrTeY7tlSS9R9I7p2rc9sG2F9hesHDhwqQuAwAAIEtXQebdGrvdvfokv2ei57xT0qcj4i9TNR4Rx0fE1hGx9dy5c3N6DAAAgDRdBZlXa+wW+VaSbprmc54p6Q22L5b0RNtf6Kh/AAAA6NAKHbX7TUmX2p4n6TmSXmL7AxFx+BTP2T4ivtL7T9sXR8SrO+ofAAAAOtTJSGZELFJJ7PmhpF0j4rpxAeZEz7lr3P/v0kXfAAAA0L2uRjIVEXdqLHu88XMAAAAw+1DxBwAAAOkIMgEAAJCOIBMAAADpCDIBAACQjiATAAAA6TrLLgcwWId8Y49G231qn+8k9wQAsCxiJBMAAADpCDIBAACQjtvlAKb0nLNe2mi78/Y6LbknAIDZhJFMAAAApCPIBAAAQDpulwOYEXt+812Ntjv3BR9K7gkAYCYwkgkAAIB0BJkAAABIR5AJAACAdASZAAAASEeQCQAAgHQEmQAAAEhHkAkAAIB0BJkAAABIR5AJAACAdASZAAAASEeQCQAAgHQEmQAAAEhHkAkAAIB0BJkAAABIR5AJAACAdASZAAAASEeQCQAAgHQrDLoDAFDHnmd+uNF25+79juSeAACmwkgmAAAA0hFkAgAAIB1BJgAAANIxJxPAMum53/h0o+3O2ecNyT0BgNHESCYAAADSEWQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANIRZAIAACAdQSYAAADSEWQCAAAgHUEmAAAA0nUWZNo+wfbltg+f7nNsr2n7PNvn2z7T9kpd9Q8AAADd6STItL2PpOUjYgdJ82xvNs3n7C/pmIh4lqRbJe3RRf8AAADQra5ql+8i6fTq+/mSdpR049KeExGf6fv/uZL+1FH/AAAA0KGubpfPkXRL9f0iSevVeY7tp0paOyJ+OFHjtg+2vcD2goULF+b1GgAAACm6CjLvlrRq9f3qk/yeCZ9jex1Jn5J04GSNR8TxEbF1RGw9d+7ctE4DAAAgR1e3y69WuUX+Q0lbSfrFdJ5TJfqcLuldEXFzR30DgDT/fMYJjbb79gsPSu4JAAyXrkYyvynpFbaPkfRiSTfY/sBSnnOOpIMkPUXSv9u+2PZ+HfUPAAAAHepkJDMiFtneRdKzJB0dEbdKum4pz7lL0merfwAAAJjFurpdroi4U2PZ442fAwAAgNmHij8AAABIR5AJAACAdASZAAAASEeQCQAAgHQEmQAAAEhHkAkAAIB0BJkAAABIR5AJAACAdASZAAAASNdZxR8AwPT989dPbbTdt/fdP7knAJCDkUwAAACkI8gEAABAOoJMAAAApGNOJgCMkOd9/YxG25297wuTewJgWUeQCQB4kOd//ZxG231r3+cm9wTAbEWQCQDozN5nXNRouzNfuGtyTwDMNOZkAgAAIB1BJgAAANJxuxwAMPReeMZVjbY744XbJPcEwHQxkgkAAIB0BJkAAABIR5AJAACAdASZAAAASEeQCQAAgHQEmQAAAEjHEkYAgGXGfmf8stF2//3CRyf3BBh9jGQCAAAgHUEmAAAA0hFkAgAAIB1BJgAAANKR+AMAQE2Hn3lLo+0+sPcGyT0BhhdBJgAAA3L8N/7UaLuD91l38fdnfv32xr9/730f1nhbYGm4XQ4AAIB0BJkAAABIx+1yAACg+acubLztbvvPTewJRgUjmQAAAEhHkAkAAIB03C4HAABpFnyxWca8JG194LpLfxJmDYJMAAAwdG489rbG2272xvUSe4KmCDIBAMDI+uPRf2i87fpv33Dx97cec0Pjdh5+2OMabzubMScTAAAA6RjJBAAAmCG3feJHjbdd703bJfake4xkAgAAIF1nI5m2T5C0uaRzI+ID033OdLYDAABYlv3pUxc23nbdQ56R2JPJdRJk2t5H0vIRsYPtz9jeLCJuXNpzJD1hadsBAAAgx58+/a3G2677hudP+f9d3S7fRdLp1ffzJe04zedMZzsAAAAMOUdEfqPllvcnI+I628+W9OSIOGppz5G02dK2q7Y9WNLB1cPHSPrFUrr0MEm3t3tVtDND7QxTX2iHdmiHdmhntNsZpr7M1nZuj4g9JvqPruZk3i1p1er71TXxiOlEz5nOdoqI4yUdP93O2F4QEVtP9/m0M7h2hqkvtEM7tEM7tDPa7QxTX0axna5ul1+tsVvdW0m6aZrPmc52AAAAGHJdjWR+U9KltudJeo6kl9j+QEQcPsVztpcUE/wMAAAAs0wnI5kRsUglieeHknaNiOvGBZgTPeeuiX6W1KVp31qnnYG3M0x9oR3aoR3aoZ3RbmeY+jJy7XSS+AMAAIBlGxV/AAAAkI4gEwAAAOk6Kys529neaLL/i4jfzWRfJmJ7TUl/i4i/1tzu6ZP9X0R8v3XHGrC9hqRXSnqopCsl/T4iftqgnRdL+mZE/K1lf7aNiCvbtNHX1uMlbSDpdyqv6+4GbaS8rkwZr2vY2F5Z0pMkrdT7WdNjwvbakuZJukPSbRHxQMN2HiPpsZJ+HhG/bNLGsLO9ckTcNyztzHbZ5/jk82HWcZHSTobMvrS4rqfGK5l/c4LMyR1Zfd1c0hqSrlMpe3mfpNZrTzVl+xWS3qEyCn2s7U0i4t9qNLFr9XVnSf+QtEDSE1Ve404N+mNJz5W0nqSfSbo5Iv63ZjOnSzpX0u6Svq4y0fhpdfuicjG+2Pb1kk6OiB80aEOSXmv7E5K+JemUiPhDk0Zsf0rl5LOJpP+Q9GFJU9fgmljW60qR+LoyPxhkBIgXSrpW0sJeE5KaXJTfIWlvSaup7JvdJR3QoJ23VdsukHSI7fMi4r/qtpPF9rUR8cSEdj4y7px1iRqsJJLVzgTt7hgRlzXY7hmSPihpZUknSVJEfKxBO8uprBN9r8o5eUFE/L8aTaSe45V3Psw6LlLayZD4mtpe17PjlZS/uSQpIkbqn6SNJB0u6YvVv09Lem6L9s6XtFz1/fKSLhzw67tC0oqSLqoeX96wnQvHPZ7fsJ3TJf2Xygjk05q0I+nS/j5IurjlPtpZ0q8l3SjpXxq2saKkF1RtXCjpWQ3auHjc67pkCF7XeW36kP26JL1H0uWSPifpaS3auUzSsZLeW/17T4M2vt923/T6Un29qP9x03aq7y3pBw3beZqkN0r6N0kvkbROw3beLOnQFvvlIZIeWf29N6r+bV53/2S109fe+eMeX9qwnR9JWkfSRSrBwo8atvN1Sc+W9ClJp0q6oGE7Kef4atuM82HqcZHQTsa5MKsvWdf1tHgl428eEaM1kll9qpgn6bMR8T/Vz+ZI2tf2VyW9NSJuqdnsapKeW40ibVE9btPHtiM3f5X01KqtR0q6p2E7D9g+VNJPVF5XU3Mj4sW250fED6pP4XVdaHu+pE1tnyjp0iYdsb2fpJeqfJL7sKQzVEZIv1Szne0k7a+yn79W/Tte5QCuY6Ht90ha2/YrJd1ac/tef1JeV+V623tFxFlN+lJJeV2SFBHvk/Q+2ztLOtn2A5L+MyK+VLOpByLijU37UTnf9lEqo1D3VP1rMjVmke0DJK1Sva6/NOzP/9neXuUD3PaS/q/OxrZ3kvQmlfftZSrnjk0kfdD2nyQdGRH312hyL0nr235Z1VZExG41tt9V5aK1saQjVALneyW9rUYbae3Y3lJl9HuD6u8lSXNUcz/3+ZvKMRoqleuatvOwiPie7cMiYg/bTe9cpJzjE8+HWcdFVjsZ58KsvmRd11PilcS/+egsYWR7Y0lPjIhvTvL/D1OJxE9r0O7bVU7ON0n6aET8ukU/3yNpD0mNbn3a3kzS0ZIerVKz/Z3RYK6W7bVU6r9vIulmSZ+PiD83aOd4lWkXO0g6TdIGEXHw1FtN2M5WKnXo/yciflJ3+6qNI1T26W/6frZFRPysZjtfknSyyqfKqH62W0TMr9nOqir7+DGS/kfSFyLi3jptVO0coYTXVW13kUrAcr3KiaxuoJD2uqq2+gPo/1YVQEfEdtPcvjcX6ZUqAULjALH6gCOVIMGliTiwThtVO+tKepfG9s+HI+K2Bu1sLOkjKqN0N0h6R0TcNM1tN5H0OknvmiiQtP00SdtExMfr9qst2+dGxJ6Dbqc65zxR0jslfUjlb/5XlZHNOxu091SVO2ebqBxf74yIyxu0c7ak+1X+5j9QGT2esC70UtpZSznn+C8p53w4/rg4KiL+1KA/WcdXxrkw6zVlXdc3VkK8kvU3l0YoyJxtqk89X5RUa+TG9gskbalyQpTKgfG+Lvo4Xbb30thBdnZM801l+70qF/QHyXhNTedWTdDOpv0B3qD7MyjZCQVVm0eoRQDdFxhO0KVGAeLuKkHdTyPigrrbT9Jmyvun4e9OSwhImn/da+vhWnL+bKNkyox2bH8wIt7d5Pd3wfYqkraIiGuqQPimyCtMMjDVqOoTteS1q/YxWrX1yIi42fYzIuLCrD426EfKaxrG63q/NucwgswZljByc42kt6p80pU0uKzwqj/LqdxG21jSryLi7Brb7lx9+xaVW4ILVG5f7RQR/9ygL+dHxLP6Hl8aEU2SmU6JiFf0Pb48InYYYH/Oi4jn1N1ukrYeNCk9Ik6e5rbvrb7dWdLfJV2tKqGgyevqa3euyu1FqYyEX9G0rb42a58UbR8jaU2N3Z6+IyLe2uB3p7x/MvQF4Q9KCIiIWgkBtk+X9HuV5JG3SHp/3ZGfqp1vqtz9WJxg1fDCnNJO1VbGKhBvjb7ELNvfi4hnD6o/WWyfEBEHJbRzhaSXaclrV5MPBSdKuiEiPlrdSYuIeE3DPrU69yS+pmG7rqedw0ZqTqa0eAh8fOTcu/U17ROi7fOnaKfRiaOyuaTDxo3cvKrG9r9USZa4qXpcKwPW9o2a/HU9ukY/er4q6TaVWw572n5pRLxsOhtGxCVVn94XER+sfvw927VugWXNrapGfTaR9Li+Ubs5KgHVjPenT8bcocXdq76uqjJt43aV2yJLFRFHSpLtCyNi98UNlvm0zTpjn6Cyz9dWmVcXknZs0M4SJ0VJX1aZwlHHUyKi98Hnc7ZrneQT3z8HRMTJE4301x3diIhXVW2eL2mHiHjA9vKSvlennUrG/OteO01Wj+ikHeetlvBclSTI3pSSOYPoTwfneNveJiKuarBtv9skXaBy695VH2t/SJH06N77OiIOrq75tSWde7JeU9vrekq8knUO6zdyQWZE7Lr0Z03Lq5PaWUJEHDHBz+rMrZunMv+szoT9/t+1WZPtprBuRLy496DhAf9z299QmaD+OEm/qrm9J/j6Z0kvnvjpk9pE0i4qJ51dNDZHq+7oSFZ/erZRWcKm8dyhnog4qe/hcbY/06CZ/oSCxzXpR59HqgS7p6qM8Ned55V5Ulxk+6UqWcLbS6p7izLr/XNd9fXimttNJSMh4EbbX5Q0rwqAm67beabt16hMk6i1HmBH7TwhInapgudzbL+9zsYuyW//Immr6gNXLxHpI4PoTwfn+JVUkuK+p7HzT5MR4xVVXluj+dt97qjuCF6pcm5s2l6rc08l6zW1uq4rL17JOoctxu3yGdb21qft36h8Yln8yalpwJHB9rkqn7iulrStykF/TN2hftvbqtxyvzkiftSwLylzq2x/seltty76k2ncvMqHSPq3vtG76baxlkpCwcYq78MvRIOEgqqtc1SWHvpXlQzGd0fEE2psv7PKyfBfJJ2osZPiGVFzwrvtdVQm8W8h6acqk/ibJICkvH8yOS8hoNH863Ft9D6I9idYNbntPv4DbdN2vqZyJ2ZvSR+XtGdE7NekPxmDHFn9yeKS7byEiLi5QTtXS1pFZfSv106Tv9dDJb1bVZECleP09gbttDr3VG1kvaZhu66nncNGNsisbgk9WUvOt6iVWd4F20errHeXcetTtudFw8n3Sb//vRP8OOrc1rP9CJXMus1VLu7viojfJ3VxoDqab9g4gWjcLdi/qQQKN9RsYwVJB2ns7/WliPhHw/7MkbS+ysjjQSqZvbWXsBrGwK7H9voR8ceEdlpVtHGpTLKBSmWSW2OalUkmmsfbM935vOPa6w9aQuU6VDto6WtvLZUqKU1XOOhfLeEXKlnYTVaBeEhELGrShyn602r1hi40me/ct+26WvJ8WDv7PkviuSf9NQ36up5plIPMMyX9P5VP7v8rae2WcylTuOWyCbbfL+l5GluP7Z6I2KqLvk6zP60rtlTzSd6vcpvyqZL+PfoSZmarieb8REST+YYpCUTVtiuq3PpoHCDaPkXlVmkvQeZR4+ZDTqeN7NJ3q6gkIbUuCdlWdYw+X6Vqi1SO0S0btLNERRvbP4yIRhVtvGRlkqMk7RER06pMUt0OlkqCw80aS9DbLCKe2aAvJ6qcu1ZTSSL6SZO7O7ZfrrL80HIqI1J1q6T02pkTEff0Pf7niPh2g3aeL+m1KiNbVrnmPLFuO8Nm/HxnN0+EPEHl7sc6anE+zOSWCVZZr2nYruuZRm5OZp81Je0r6fSI2M92owW+syXcTtlRpYLHF1TmYXyrdafa6S95eFLDT3Gr9QUEF9v+zzob2z4mIg7zkklfTZK93h4RR/ddBBdrOErWdr5hdgKRVJbN+qWk81QCxBMl1QoQJW3Yd9H5ru2LG/Qju/TdBWpYEnKSv3vjdTJVjtEd1PAYtf0QlQ8mT/PYEkRzVPZTU8+LiO2rW7pfsf366W7Ym8frkpC0eA1cN0z4iippo2pjjqSPNmlH0htUjo/vRcRnbDcdQTqrmou7saQPqNyCrR1kqkxHOFBlUfiPqyRyzFodJIE8UtJz1Px8mHqcOifhq9Vr6jMU1/UOroEjHWT+USXR4j7b71KZfzYU2n56krSVyijJlpLmJnevlliyYsspblax5du2z5N0lcqczmkvg1T14bDqa9sAvpcUc0TLdnruk/QMlfJeL1IJHOrITiCSpEeMCxAvadDGH6tjqjfyXPu2TiyZqb74DkPTwEXtKv5k/92ldsdoVmWcfhmVSe5yWeapV0GmTj3tqTyi4XZZVVIOVcm2/72kg6J5neblqz7MUwlU/6nOxk5aGSVRdhJI2/Nh9nHaKsGq0vY19RuG63r6uXCUb5cvJ+mhKp+69lGZB/mLwfZqwk9Pr4mIOstTrF9tf7+kw1Tm1H2ti75Osz+t1v3sa+fxKqOiP4sGlWyGUeKcn7QEIttfUZmq0QsQHxfTXHKqr42VVCbL9xJkTmg6XaKaKnG2xgKXfRtOnP8Ptaz4k6U6RjdQGXk8TNK3I+L0Bu2kVMap2mpdJaWai7uPxirIfKPJ331cMPU3SadExKkN2mlVJWXcXNN5KvXd3yfp/xrONd1V5bbpKpLeK+lbEdHmg8FQyJrvXJ0PH65yXBykMgI9sOIUGQlWWa9p2K7rmUY5yEyrTpHJ9sV9n552s31JTCO7N3vyfVuu1m10i4ottn+rcrH5nUrlo8Vq3uZeTWWi/I3VJ9K3qYz8nBg1lzZxqbywMMo6gPtLujMizq3TxhRtN0rYqQKE3bTkfMNGf/O+ALFXqvALEVF3HcdVJD0+IhbYPkglSGgaZK6lnNJ3rUtC2j4wIr7Y93h9SWtFxM/r9mdcu1kVo1ZqM/d5mNh+rMY+VDZaCsktq6T0zTV9kFhyqa/ptLVK9fvv6/tZSsJXUxnz5at2Ws13dimlPKE6f69sbpFglfWaur6uJ557qPgznvOqU6RUO+hrr9GnJy9ZPi8kbSrp6ZJ+HQ3WRXP7pZTmt72FY/u1KheJTVU+wf1S0jWSrokaGc+2z5B0vqTvRsRvbW+jMtdvx4h4QY12TlRZiuKrEXGty3Itz1KpaDPpBWmK9rIq/lwj6Ssam29Y+yLY19a6Krd3eheMqHsys/0tlSWCTrL9bklPiogXNexPStJFtW2rkpC2T1aZy/3xiLjI9kkqyRu15mk5L1HitZJepTKtaQVJK0RE23VJG7N9bSQksth+q6TdVZY920blzscxDdoZiiopLklVvQ80R6gMarxN0mMiYtuGbbZelaIKhPZQud6cHBE/aNiXyzRuvnPNQKp/BZJQmQ6ys6SLB3lNbhOEZ72mDq7rWeceKv5MQ1Z1Cjun2kHPASqfnq5QuaBNq9pPjFXu2E1lDtEDKreumi6F1LaKzPa2x49A1KoqERHH2V5PZS7KVpJeqFKi8mqV+S3T9fCIOK6v3askXWV77xptSNI/RV9SQrVvznL9yi/ZCTuLIqJpcsR439GSAauneO5k1u4FuRHxQTesuFE5ywlJF16yJOT+tp8T9UtCPlrlmPqqpItUbnlP+wLk/ESJ/VWSCj6mklRyQt0Gkuf5fcn2oRHxybr9GGfvqDJwqztOl0mqHWSqZZWUnrYfuFXmSG+ucpv8tyrnr2OafNCp+pNSBSuWnC9/spvNl5fazXdWRBxZ3UF5uaTXqIwaPjkirm3YZNY1uXHSatZryrquZ517OjiHjXSQ2atOsb7bVadYSdIFtr+rdtUOep4r6fgGt3H/VWXO0O0qF51rqv9aXw0SL9S+isyPomWije3bJD1MJbC4RmVO5x0NmrrQJWHk3Gr71VVGIBfUbOfn1SfLc8a1UzcRIDth5zLbp6mUf+zNN2w6YpMRsP6hGr3pVdz4U4u2spIuWpWErNynUnVjddtbqZxs61Sfyk6UsMp8r4eqLNOzYd0G2h6j4+ylcj59mcrrahKoStL/2d5eY0tgNf3w1bZKSk/bD9x3RVlz9F7bN0T7ebQZlWjGz5f/sKr58pK+VLOp820fpYbznW1/UGW+4k+rftwqaTXbO9QJ7PqkVCCKFkmrWa8p8bqede6h4k8dHqtO8QuVSdhNqlOkVDvoa+8dKsPqf1FZpuDs/tuFU2z3oOoWaj4i0ZrtN0TEp1u2Mdkt6Ca3cJ+qcgtuPUmLVBK96i4ds5yk/ca3o/KhoPaF0HkViHq3Zvr/5rXmMvV9Kn22StZr44DV9soqo/G9ihufj5qLhDs/6eJslRHaXkLTSyLieTXbWFdl6aHfaGxEfZuoMuFrtJOVKPF4lfehVUbszoiIT7Rtd9Bcqg99RGPzgt8RETc1aKdXsahVlRS3X7v4NyqJH1bJMu59P+27OuPaa12JpmrnCDWcLz+unVbzncfdEu7XKDjMuia7RdJq1mvKvq4nnnuo+DOTXCplzFMZ3botplkpYyltriHpzZLeFBEPa9tewz60XUoJk7D9lIi4urod+HKVW2mnNAxWX6m+k49Uf1K4J67MVDVVf/K97SepfOq9MSKub7B9WtJF1d46KtnTvcDlQxHxlwbttDrWba8bEX+qvt9J0ioRcX6DfjxGpYrNb/t+tnVE1B2d7237oMXvG3y4WFPSOzS2jz8SEXc17M+ciLjH9j9Fg/KWVRsHRcQJfY/nSVozWiZqDZLtp6jcet1A0ktUbsXuF4PNwt5ZDz7/DGLu62oqU0j+Kum0iGg1gp0VhA8TJxWlyGpHGuEg03mT1PsrZXxY0u4xzUoZk7T3fJV5Vhuo3C46YxAnRbdcSgmTc0kg+b+IONj2xyWtq3JbZZuIqDtPtD8gW1XlNtrtEfHqrP5Wv+PM6fbN9idV3jfXq5yIft5g/mN/e60/7Ng+tOpLf6Zx3ezyVse67Q9J2ioi9nRZR3R3lWk6K9Qc3fiEpCdIWlnSN1Ru2R8m6Y4m75+qzd6HjFVVpoDcGBEvqdnGWSqjPb3R4r0jYq8GffmQpOUi4h3VLc/LGn7QSUnUqtpKL//aoA/Z5422c0177bR+72So3n9XqtzKXTUi3tDR75n2uXDYuGWSVnY7vS1H8p/KKOGhCe1cVn29qP9xi/bepVKObdD75+Lq6/zq6yWD7tOo/Ot7rzxKZf3H3oe5+Untf6arPk/zuZdN9bjm7/2USuByjcp85W81bOcKlcB3o96/Bm20Otb7tp+nMkVn5erxxTXbubL6uoLKXK/PSHp04t96eUnHNdju+1M9rtHOD7Paqfb196vHF6jc7qzbzgkq8x5/rDI9ptU5vsXf5aLqa8p5Q2UN0b2S+9jovZP0u7/f931n16s658Jh+9f0WOqqnYgY6cSfrEnqGZUyFouID7XZPtFClyUu1q5Gym4ddIdGyJ0ua3XurVKTfY7tfZo2Nu5W50MkdbGMTZ1bGrdV85mukbStpN/b3iiaLX6eUXVDKktPXaC++Xkqa4vW0fZYv9v2virTIz4m6f7qb7dizXb+4bJGp1Xmh35AKreDo+Favx4rTymVOWi1l0eRdK3tz6mMZG6vMtLRxN22t1VJzNtazctltk3U6klJtEmQet5Q++ROSWnvnQyrV3PvXX2/eEmdaJZANJnZfHu3VZJWB+2M7u3yLH5wpYyjoppzNZu5xUK0mFo1d+gVkm6JiG/bfpRKPdpPRIPFmavbVb0D9W8qyWLTXkd0mr9j2uueTjLpPaLZJP7WVTeqds6R9KI27+G2x7rth0t6k8rf/dhqGsDhKmv01ln3tbfs0OJb/2qZ5Dfub3afpP+OiNpLT9l+rkplphuiYZEC25uqJP70EsfeEQ3mZSYmavUSbV4t6euS3hURW9btT1vZ543Efp2oJc8/jd47Sf2YSKNzzxS/p/Ua0IMywT5qel5OaUcaoSDTJWPx+THJGm7VAbtbRBxfs93Wc72qdo6JiMO85Lp1M54dPlECQE8MYDK3tPgNfVhE3Fk9fr6kTSPi44Poz7CxvaLKEhKbq8zR+lJENB39mex3XBS5y91M9/emfNixvUBlztjiUok1gublJT1TJdHmoupny0naJyK+Xrcvw2TcKNQSmo5MZLK9fLRI4GiTqOWyxJ1Uliq7T2VEdXmVOZnPaNqnvvZXjpqrLmRyKQPaO2/coBbnDbcsdDBJm40qoHWtzrnQSVWVJmi3eYWdsWPiTpVjou0SX3KL6lUjE2RKku2XSHqeyhDvj1QWst1EJUtvA0lvjZpJBbavkPQyLVlVYuAn56b6JnHvrLLA6tUqQfQa0aAaTVKfzlZJcvhxlGSAsyT9IyJeWLOdrInuqVWe2rJ9ikoCSW9dwUdFXzWG2S4j8WeKts+MpUzit/3fKreEVleZNvJrlRGkCyPizVl9GYS+EYnNVW51XqtSZeu+iNh6QH36d0k3qqyR+3ZJ50SDJI6ERK2vquyLv0u6TuVceI1KCeLa53jbH4mIf+t7/MOI2L5uO1mq88aNGpve0Oi84SULHWyvUmr3sAbtpFRAGybOq6qUVamn/5g4StIedY6JvnbeL+n5KufEkHRPRGxVtx1pxBZjj4ivVrc+9pL0XpVlY34n6cxosMxKJWOuV2tOqtzRu5Vk+8KI2L2v/UHNQ5Kkh0XEU6uAXirzDptUGGi7qHKPnVvlqa1H9J2Avmv7kqYNtQnEbb89Io4ed/usUZ3wvjbHr3LwYZWTW5a1pvGcR0TEDratUrHlM5J2igZLIA2bGKsocr6kp0XE/dXI7fcG2K09I+JpLgUuNpXUdD7d8yJi+2rk6VTbr6uzcVQZ0i7LX+0n6V9U5tL+QiUonxbbD1HJeH5a38jxHDWfa5plw3HnjYsbttOq0IHzK6ClyBiUiJZVlZxfYaf/mPiK7dc3bGdHlakoX1D5wF1rzel+IxVkSlJE/D9JX67+ZVhRJTlhoPMVO7iV+UA1FeAnKnOsBukHtq+UdIvL0h3z1CwRKWWiu/KrPLX1vy5L4vSWjrmlRVttAvHe+pVHtPj942Ul/kxmOrdqVulLKLhDpczhFrazEwoGaTVJe1bHxhbV40F5wPabVSqabKLmFXtaJWpVI9hbqswzvE7SaSojqzfV7Meukl6gUhr1CJX30b0q9csH6Y99543t1awynFT280s1dv6puzZqdgW0LK0HJdy+qlJ2hZ3MROWtVEYyt5Q0t2kjI3W7vAu2r1YZEa0912uY2V5LZS7cJiqjtJ+PiD8PtFOSbG+osq8fHxE/HlAfUipKZHGpkfuvGptbdULTOUBuWd0km5MSf6Zof6mT+GcqoWCQqjnrb1c53m+S9NFouAh6Ql8eo7Jc1SkqF9bfRMTVDdoZn6j14Yi4beqtltg+uxLNudG+pGSavvPGFirnjS80OW/4wYUOjurNn6/ZTkoFtCwZ50LnVVXKqtSTkqjssrrFBiqj8YepJJt+rVGfCDKnNkHSxUkRUXsoO+t29yiyvavK7ar+6gJNb7+mzO9rk1CQxSXb9GCVxY/PcVne5F5JJ0bEXwfQn+dIOn988oDtfSLiGw3b7HSVgzqT+Iddm2SAavu1VY6NOyTdOoj3dBYvWVnp6SprktaurJTNZYWB/vPYwObv235nRBzV93gLlalJ077dbfsFkhZGxA9s768yH7PpqgKrSnpcRCywfZBKBbTWCTNtj4th4LxKPVmJyq0rhC1uiyBzaqOadJGVJJPB9g0q86EWf+JqMnI4wfy+RlWM2iYU9LXTah/bPkPS+ZK+GxG/tb2NSsLWjhHxghbtNgrEbd+qMgr2gYj4dt/PGy/5Uc2hvrr6d01E/L5JO8PESYljWckA1batEwKcVEWtLS9ZWemdKokXtSsrddCvb6pMQeuvkjLI/nxFZc7rx6PkK5wmac50z4nVSO9tkr4aEdfa3kul4s8aETFpWdgp2vuWSoW7k2y/W9KTIuJFDdrJPC5aDUpkXUedV6knJVHZiVWeRm5OZgfSki7a6GAkNCtJJsPvVNbdazuClTW/r1VCQZ+2+/jhEXFc70GURKSrbDcueTZBIF4n0ebnkvaV9P5qQvlbo5RE9dSbTekgSc9WmVy+u+3bIuIRdRsZpg9Napk41kEygJSTEPAl24fGJMvELY3tAyLiZC+57qukxQkU07VTROzoUqv8VZK2jIj73DyxJcvciHjagPvQb1OVBI5LJH1VpUxlnZHDf4oqcUySqvPYWa6Z+NNn7Yg4qWrrg9U1bdqyj4uW58KerOvoAxHxxpZtSEmJytG31qzLahCfbtohgsyl60+62F7tki4kNfv01MEtv6wkmQy3qFQTOVNj1QWa1EnNqmKUNXm67T6+0CXr/1yVW5yrq3yqXNCwP1LLQDzKvN3X236CpI/ZvlFl0ntT16ocW2dLOlT1ky56hulDU9vEsexkACnnPd22itp11deLG/zuflmVlbKdafs1KnP0Znw6ywTuUAkOVqlGIR+tkjk/XT+vRjPP0ZLnnz807M8fqhH1K1UqhdWdK5h9XGQMSmRdR7Mq7KQkKjuxyhO3y5fCSZOn+9pLuaU7SqqL3hIiovaIcdb8Pg9RlSeXrOfdJa0naZGkH0RE4+Uk3CLRxvaBEfHFcT/bR9JrI+LZDfszV+X1vVjlwvG7iHh8g3aGJqHJSYljTkoGqNoamvd0W06qrNRBv3ojc60rNCX1ZxWVqkq3qEwp+LGk7SPiC9PcfjmVufJLnH8kHR8RtZcfsr2yljw/H9/kWpp1XLQ5F2ZzXqWelERlJ1Z5Gtkg0/ZHI+JtfY+fplLJ4fSW7baqUmD74r5PT7vZviTG1iCbzvatJuRWJ44Xq3wCe4jKieNKSV+LWZoIMNE+6amzb/raS5k8XbXV2ULjDfszVOVEXZau+o3KBfDHKvMybx9Uf7J4CBLHxvVnZUlP1lhiQdQ9NmxbJSt8PUk/U1m0vNayOC6ZyturrF96h6QrIqLukjhDadyHi1C5vg5sVQppuM4/Hlsjs3cbVxFxcoN2LGmjiLjZ9jMi4sKG/Wl9LnRuVaUuKvXsFBGX1txme5WR5oeo3PG4QtIvoywPWb8PIxxkfl1l3bIvR8THq8crRM2ECSdXKWj76anthNzqE8oqKmsB3qsyp2UnSff2z7+ZTZxcxShx8vTQjFpnB+Jd8zQq9UywzbBM4k9JHMvkhMQC26dL+r3K+eItKiOH0x4lsd1fje0OSeuoLNL9koi4uE5fhlHf6M9qKvvoJ4OcJzxM55+qP71koVVVRlZvj4hXN2jnRJU5/B+1fbzKe/k1NbZPOxc6r6pSVqWeVvGK7Q+rzOO9XOWO0EaSXqGyrumOEfGrun0a5TmZ8yJi6ypg+LjKCW3aE4TdXZWCA1Q+PV2hUqqrVmCXMCH3URO86Y6tLkIzykn13CO/ilFWlaeuFxqvozen90GBuMoFcdisVefJE1xQBzmJv1XimCeurCSp+dJeykksmBsRL67ezz+o7orUcbikJ0fETb0fVHO/TtVwvgdr6f+QbnuOpI8OsDvScJ1/FFXST+U4259p2NSje/s6Ig52zQQi5Z4Ls6oqtUrM64tXNmwZr+wYfclrLktNXSzpdU0CTGm0g8zfuEy8/5vtN0l6lEpN4ulKrVIw7tNT79agJG0tqc7t7rYTchdWE4y/r7F6zbuo/iTs1qKqfxt5SU1ZVYyyqjxlJSK11kEg3rW6t1iGaRJ/2ySbtMpKfeeLjMSCG21/UdK86u7BL2tu/4Ck+1yywnv+IWn5mu3MFrVXSUg2NOcf6UHXwDUkPa5hU3e4VNrpJRDVOk8nnwszqyq1OWeMX+Gjabxyj0vy2qUq54kfqgTlja+FI3u7XFo8x+FelTfi/0jaOiLOq9lGSpWCrFu64yYI36eaE3KrT9iHSNpO5UBfpHKAHBsR90y3nWHkpCpGzps8PVTzH6s+na+Syd0LxPcdZHLCZFxz7c2201AyeYiSbJxf1WYvjb2us6PGBaTvjsX4C2LTYH6ojLsj8zeVxcZPHWB/hur803cNlMq16+xokKhl+6GS3q0lj6/a87gzzoXOq6qUVamnVbxiez1JR2rJ+OBKSUdGxB8btTmqQaZLdYPna8kV9JssbrqcymjfvSpD6QuaToCt2rswIp7R93haF1O3qKoy7GyvobJO4saSfiXpi4MMeN2yytMwz3/MCsS75pqVerIuqG3ndVZttEocs71dRPyo7u9dSpv9yRLPjIgLGrSxnMoyRhtL+lVEnJ3Zx1Fg+7EqGd0/i4i6I72dcsuk1Ra/Nzsxs9defwJRk3bWUstzoROqKlXbpSWbDptRDjJ/qhKR91eRabIsztclHS/peSrzOteLiGe26FejT0/9wajtwyPiA037MGxsnyfpQpV98iRJT4+I5zZpJylxo1WVp6xR6772hmmhcTmpok2GzAvYBPM6m1aMapU45rGVJ5aYxN+GH5ws8UBEvLZmG6erjO5fr3KcrhkRL8voXxPD9D6UJNtvVVnu52qVqRfnRsQxA+xPStJq2/3cwfkwrRpNW25ZVamvnaxk06G6VkijPSfzNknfiPbLADwsIr5n+7CI2MP2D1q29yKVT0/7qXx6mm5Zrf5bTLtJGpkgU6XucG+S/Pds7z7lsyeXlbjRqspTB/Mf0xYaTzoJ2S0q2iT3J3MSf1aiRNvEsZVsH6glJ/FLarbkS6VtsoQkrRsRi+d4NWwjU9r7MMneEbGjtHjk+DJJMx5kOj9ptdV+zj4fRlI1mqRzYduqSj1ZyabDVJRC0mgHmddJuqj6ZNGb6N7kBP3/XGrSXm17T0mNb5VXffiLpKPH/9xLX7JlTZeFuZeTtJbtxbVaI+LyNn0aBraPUwkQtpX0d1cl6Go2k5W4kVXlKSsRKbM6U8ZJqG1Fm7T+JF/AshIl2iaO7S+pf4S2TdnOnv5kiW3UbCL/vS61wnvH6V22nz7AKSCZ78MM/+eyxmDvDkjblUiaSk1aVd5+7j8fNk36yUh+7ck4F7atqtSTlWw6TJX8JI327fJXjv9ZLLmEwnTbWUXSFhFxje2tJN0UHSwevLQ5aE6ewD9MJvpbSc3+XhmcN5l7LQ3Z/Mdq9Gk7lbmmd0v1k5qcVNGmrz+tK/U0nYYyro2seZ1ZiWMpSYdVW71kiceq1KCvnSzhJRM3eiKalYBtLfN9mMH2xpI+orGFud8Rfcs1DaA/WUmrWRWs1lI5vjZWOR9+ocn50C2TX/vayTgXtqqq1NdOyjljGI1skJll/O0qqdUtq6l+T61s2lHiklDwAkmPVIOEAndcxcgDmjA/rg8plTvcMqmpr51hq2izlhoG9JnzOqv2UvZx1dazq3ZuiAbJOuPackSE7U1UqvUM5G/msSQHSYvXxW1aUWvY3odzIuIe2/8UEXWWzOuqP3NV5i5KpeLdFQ3bab2fXarjHKSx46JWdZxxI5hLaDh/MetcmF5VyQ0q9fRt2/pvnnqMEmROzUlVCqbxe5blILNVQoGTqxg5b8J8ViJSWuUOt0xqqtpIrWjTxUm65u/PTkxovY+rdo5RWZS+N23jzqjWlm3Q1mdV5nxtKekZkm6NiH2btNWWx5IcHtBYdnCTIGGoKivZ/pCk5SLiHba/J+myQY3yVv05QeWc0VvKL3pzRmu2k7Kf2x4XfSOYm6vcJr9W5f18X0RsPdP9qdrIShZMS9JSzt885RhVteFI/ZN0TPX1Iknzq38XSZqf1P5nOur3RYPedwP8m13cZl9IunSSn19Ws50tJb1SpS7zAdW/10k6v+HrOlrSXln7p/celnRJ4r6u3VZvv/b+TnX387i2PiXpDEnXqNTF/lbDds5L2M8Xjnvc6JyRsY8n2k7S91u8tu9XX79bfb287f5q0ZdvqhTGaHVuznwfJr2uH2b9vZL6c4HKfMqvqeRfNOpP1n4ef14ff5zUaOd8SctX3y8//rit0c7F4x43ORdeXH1tdG7uu+b8POmak/U3TzlGI2L0En8iuYrMuFtpD1GLCct9bT5o5Carv7NU24SCrCpG2RPmsyZhZ1buyEhqaludol9WRnfGJP6sRK2sxLFFtl/a106bueD/sP1xlao926pGid0OZCU5ZL4PM9xd7dsFKpXcpn0ruCP3qYxaL6+yisnaDdvJ2s9Z1XFWk7RndV7donrcRMZx2vbcnFWppyfrb551jHK7fGn6bqWFytIEjaoU9LWXdutzVLRNKHByFaPMhIsMWQkpVVutk5qcWNHGSZV6kibxr6WcilFZiWPrqOznLVRe11ERcWfddqq25qpMITlP0lMl/ToGlCSTmBg1NJWVqv5sqpL400uuekcMcF5mdV5cX+UDxUEqo2O15/ll7efE42JjSW/XWALRR5vs56RzYVayYFaSVtbfPC0RaWSDTLesTmF7NZU3z43VCMvbVOY4nBgRf23Rr4v7Rm52s31JROzctD3kc8sqT1mJSNkJKVmcWJ0i8SSdlmyTbRCJYy71wTeMiCsn+f9VJO0UEee3+B2NX5ftJ2jJuzm1l4bLfB9ms718tF+juenvTr12DeN+dllLeXNJP42WCXGZBnGsT9CHJ2ks7rm+RTutj1FptIPM01Vul/aqyKwZ9ZJJzlCZ+/GdiLjJ9jYqiQE7RsQLWvSr1ciNh6zKxShyyypPWYlI2QkpWZxQnSI7gE6axJ+VqDXwSitV0P0RlZGIY/svENW+P1TSERHx0xptZr2uoaislM1lYfAbJT1MZaTtnIh4wwD60bt2fTciftv22pW1nxOPr2MkramxY/3OaJgQl9CXYUsS/aTKAvE/UblW/Dwi3tqgnbQ7riM3J7NP2+oUD4+I43oPolQ7uMr2VAumT8cBKp8yr1A5UOpmP9vDVeWiMdvHRMRh1d+m92mnt1zCIDPt21Z5etQEJ5pjbdf6hBv5lYOyZFSnyKzUI7Ws0lRpNa/TQ1RppRrFfbPL4uAfrkYuXf27StLLI2Ja/ergdQ1LZaVse0bE01wWLd9U0qCKZGRfu4atGs1T+u7+fc72jN/V6eCYyNo3T46+bPK615w+WcfoSAeZbZNJLqwu5ueqrA22ukqd1AVtOlXdrvhE77HtHVVGvKZr2KpcNBbJSVqJ2lZ5ykpE6slKSMnSelJ4BwF0xiT+tolaXVRaOd9lOZxGx3pE/FDSDxv+/p7s1zUslZWyPWD7zSoJLZuob+RvhmVfu4atGk1mQlxTw5okeptLZa9rVOKe39veqMHIc1qy6SjfLm9dncKljOPuktZTmVf3g4j4Vst+tRpe95BVuRhFblnlyfmJSGtpiCoHOXNSeEKlnqqdlKSCDB6ySitZEl/XUFVWymL7MSrLcJ2i8qHyNxFx9YD6knbtGsL9nJYQl9CXYUsSnagyYO0Pp1nHqDTCQaYk2X6yxqrINJ4Am9SX3vD6OyQdVf14jqR9+oPOabY1VFUu2nLJXtxNZeRGUjdVlWr0Z0aqPM1Wzq1os5YGHEA7uWKU2yeOraZSv/yvkk6LASWQzBQ3TJbIfB9icm33s+2Hq8wBvlfSJ+ocC7OB86oqZVdnulOl6MJAq94t1/aXD6tqAuyRKheOD9v+6KC7NMHX2sPrLtUXzpN0mkpg9qWk/g3SdyRtqLE5Y+PXDptpvT6sJmkfSZMmqcxIZ+zzBvn7J/BFSeuqvA83qB43EhF/iYijI+J1EXHUgEZoT1BZieI3KvPofqvydz+hYXunq9zG+5ikV0s6s+b2p6ns3ydK+mTDPgytavS634caNpX2PsSU2u7nU1TuLvxF0mfadsalqs1QqPry35LOUjluPzLgdvrjg13UMD5IPEZHek5m1gTYFBFxnaTrbD+m5ajY8yJie9sXRcSptl+X1ccBWhQRg/4QsFhEnNT38DjbrU+MLWVNCs+SkWQjKS+rsqWURK0+bRPH1o6I/5SkNvs2S1aCXgfJEmnvwzZsHxARJ1dTtJa4NVhnetYQa7ufV4qIUyXJdkYp08YJcR14pEq56VMlvVSlQs4g2+mPD75i+/V1Nu7gGB3pIDNrAmyqiHh3y2HxYaty0ZjHlrG5zPZpkk5WmfQ86HUg06s8tZQ1KTxLVkUbaTgC6OxErbaJY6tXc+pcfb9D7z8iYsYzlhMT9LKTJTLfh21cV329eEC/v2tt9/Nc2y9T+XuvW30vSYqIrzToT+uEuERZFXaGpTpT9jE6unMysybAZnPLAvYesioXbUySnCXVTNDK5uQqT6MmM8nGCZV62uogUatt4thE5y5pCM5fGRITiIYm2WuUtd3P2ef5YUqIc16Fna6qM304Im6beqsJ20lLaBrZIHNY2b5A0p7qGxaPiGnP+fMQVl9ow/ZTIuLqKlni5ZJWlnRKTHMNv+S+dFLlqUV/UhNSutJmUvgoJm/MhsQx25tGxG+m+dy9IuIs2w8d5KoGS9PmfZjwu9dRGeVbSyUh84o6Hyxmk0Ht52FKiOviWuGWlXpsP14lf2Atlffg/Ij4ZZO+ZBrZxJ8h1nZY/KWS3i/pvdW/IzI7N5NsnyTpNdXDY1SW3JirMml5EE5RmXvSG7W8RCX5Z1D9yU5ISZE5KVyjmbwxVIljkuRSEanfl2ts/qbq69eSupMi+X3Yph/Pk/QrldHw56rsr1/Y3mUQ/ck2LPtZw5UQ17tW/Kx63OpaUSUqv19jicr/VXP7V6ucQ7dUmUO5lcr82Rc16U+mkZuTafvtEXF0dctp/CT1YRjxe7HKsPhbVIbFa03M1fBVuWhjo4jY1fajVF7DVhERHlxFm66qPDWVnZDSSheTwjUkyRuZhilxzPZGKtNzHtc313iOym256Qrb75O0icsCzWP/UW/d4Rs1LjFGY+fmR9dop4v3YRuHqySa3tT7QbXfT1Wz6lVDYQj38zAlxGVfK9omKh+scv28o6+NtVQW5J/2h8OsY7TfyAWZknon+CMG2YnJVHO8flU9fM9Uz53EsFW5aOPO6jbD3iqf4ubY3meA/emkylML2QkpbaVPCtfwJG+kGbLEsU1U3jNrV1+tcruxzgfuvVVGRp6nMmLTSERs1nTbcbp4H7bxgKT7bM/r+9k/VO5WzWbDtp+HKSEu+1rRNlF5RUmPsT1++b+V63Qi8RhdjDmZM8RLLv+x+MeqvwzIUFVfaKOa1/IKSbdExLerEc1XqyzY+8cB9Sm9ylOLvqQmpCT2K29S+Agmbwxj4pjtL7a9k2P70IgY9G3KxTLfhy370Tu3j7/AD3oViBRDtJ+HKiEu81rRNlF5in2jiHhVkz5lGdkg0/bKKkP9/VVkBrYsTpZRTJTA7OKWFW1GVXYyQF+Sn9Ryyk+V8f5EtTwfVskFG0j6naTfR8TdTfqTgffhzGA/zwx3UKlnGIxykHmZpGslLax+NNBlcbJUE/h/qZJlvL3KvL1XTL0VkMf21yUdr3L7dB1J60XEMwfbq8GzfYak8yV9JyJusr2NpJ0l7RgRL2jQ3hWSXqZyOzYkqcbts/FttT4f2v6UykVwE0n/Iek1EfH8Jv3JwPtwZrCfu+dSqWdvleShoyTtEREPWqViNhrFOZk9D0TEGwfdiQ6MXKIEZp22FW1GVXYyQGaSX8b58AkRsYvt+dVI7dtbttcW78OZwX7uXqtKPcNs5ILMKqtPKhUBjlJJBOpVkRlotZ8kI5cogVmnbUWbUZWdDJCZ5JdxPlxYZZevbfuVkm5N6FcbvA9nBvu5eyNTyW+8kbtdPmyTg7ONYqIEZhe3rGgzypKTAdKS/CY4L9Y+H9peVWXOaa+ayOcHUaSgrz+8D2cA+7l7TqrUM4xGLsjsZ9vVuoubSLp5VCbS9vMAq1xg2eRZUNFmVNh+gpZMtGk8ijQuseC2qFkxpboQPkNjyUMxyL/7sL0PMxO1hgn7uVse0ko9WUbudnmP7c9KuqBaUPYZKrd29h1sr9qzfX5EPKvvRx/SLF7wF7NSb6mWVSXtIel2SQSZySZItDlKUqNEm4kSCyTVTSz4jqSvaCx5aPySPdPty3kR8Zwm245vqvo6LO/Dl2pcotaIYD93xKVSz3slnaOyBukjJL3DpajMwCpsJR6joxtkSnpcRLzO9sERsaPtmV6sNdUQVl/AMmqYKtqMuMxEm4zEgkUR8dEWfei53lU99DaNDOH7cJSqsS3Gfu5USqWeDqQco9JoB5n/sP1xSTfa3lb1yqgNo2GrvoBl1JBVtBllmYk2GYkFl9k+TWUUq5c81GTt4W0kHWL7+qqdRouWD+H7cJSqsS3Gfu5USqWeDqQco9JoB5n7qdxGPk/SU1X/1tBQiYjrJF1n+zHMf8OA7Vp97VW0GZnlNobMASojHVdIWlNSm8od/6KSWHCnpL1Ur6xkz99VkhK2rR6HSsnTWiJi16U/a1qG7X34cJUlq2Z9NbZx2M/duVblGB/vJzPcjyUkHqOjnfgziqi+gEHJrmiDetok+fUlS/RGTBolS2RV/GnTzjC/DzMTtQaN/bxsyzrWl0vtFWbC6SrrY35Mpc73mYPtDpYhp6jMAe7V4b5EJZHktIH1aITZPn/cjz7UormXSnq/SpLBeyUd0aA/n5J0ZNWPTVWSgGpLaGco34fV6zpC0gdVkrVOHWR/ErCfl1FZx7pEkDkbPSwividps4jYXyXjD5gJD4+I4yLiJqlUtKkSQeYOtlujxfaW1RzMDWwfUP17ndol+fWSJb6ksiD7lxq08YSIeKGkv0TEOSq38Jto286wvg+z9s+wYD8vu9L28SjPyRxVVF/AoGRXtMHEukjyy0iWyEpEatvOsL4Ph60iUlvs52VX2j5mTuYsQ/UFDFJmRRtMzfYHI+LdSW21rh40QcWfLzQJWjPaGcb3Ydb+GSbs52VT5j4myJxlhq36AoDZwfaTJW2skshxfY3tLOnZKkuZXKGSpb66pE9HxO9rtPP0yf6v4VJIQ41qbDOD/Zyni2N0ZINM2ydExEGD7ke2auha6qu+EBGvHmCXAAw5259USZK4XiXL/GcR8bZpbnuayrqaD5O0vqSrVJZCelZETLvamO33Vt/urLIc0tVVX9ao086wGl+Nzfalo/C6hg37uTtdHKOjPCfTtreJiKsG3ZFMQ1h9AUAi2zfqweXyejWaH92w2SdHxI59v6POyM96EfHSarufRMRbqu+fXacDEXFktd2FEbF7X1/m12ln2FCNbWawn7vXxTE6ykHmSiq1y7+rsRXrmyxAPFSGsPoCgEQRsVkHzd5mez9J16gspv572xtFxO+mse1a1dw8q1RSe6rKyiRNV7Z4oFq38yeStmjYxjChGtvMYD/PnLRjdJRvlz9y/M8i4uZB9CVT33B2r/rC2RFxwxSbAFjG2T5xgh9P64P3JNv2GqhdhaiqzXywyu37myV9PiL+XLedYZOZqIXJsZ+7l3mMjmyQKS2+nbO5pBsi4oJB96eNYa6+AAAAMN7ILsZu+xhJL1GZr7F/9Xg2G8rqCwBQh+3zBt0HAJPLPEZHeU7mUyJi5+r7z9me7UtkPDwijus9qBKarrK99wD7BGCI2X57RBxd3fLu3bbqJRENao769bb3ioizBvT7U3WUqIVx2M8zKu0YHeUgc5Htl0r6kUqt79m+YPmwVl8AMLx6q1EcMchOjLONpENsX6+xpMxaC8MPk44StTAO+3lGpR2jIzsn0/Y6KosGbyHpp5KOiog7B9urdoax+gIAAMBERi7ItL1dRPxo0P0AgGFhe2WVNQZX6v1spqvs2F5OZbmZbVSWX1sk6UpJX4uIB2ayLwAerItjdBSDzPkRsdv4qgAAMJvYPi8inpPU1mWSrpW0sPpRRMT7Mtqu0YcTVeqnX6ayMsYcSTtJurfJUkgAcnVxjI7inMyVbB8oacPxdb6p8Q1gFslMkHkgIt6Y0E4bj5qgNN2xNasPAehO+jE6ikHm/pL6q+J4sicCwBBrPfne9kbVt+fbPkolEegelcamU+0n08KqD9+v+rC6pF0k/WmG+wFgYunH6MjdLu+hKgCAZd0U1XpmfAkj23MkHSJpO0lrqMz3+pGkYyPinpnsC4AH6+IYHdkgEwBmO9uPl7SBpN9J+n1E3N2iLUdE2N5E0s0k2wDo2shW/AGA2cz2pyQdKelDkjaV9JUWbX1W0j62j1SpHnZ6SiexGJWMZgb7eXYhyASA4fSEiHihpL9ExDmS1mzR1uMi4gxJ20fEjpLmpfQQ/a63vdegO7EMYD/PIqOY+AMAo2Ch7fdIWtv2KyXd2qKtf9j+uKQbbW8r6e8ZHcQSRqqS0RBjP88izMkEgCFke1VJB0t6jKT/kfSFiLi3YVtzVda7O0/SUyX9OiJuzuorAExkmQgymegOYLaw/fTJ/m+mq/SgnsxELUyO/Tx7jOzt8mqi+wWStpT0DJVbTfsOtFMAsHS7Vl93VrmtfbWkJ6osKTJ+oWQMiSpRa56kTST9h6QPS3r+QDs1gtjPs8soJ/4w0R3ArBMRR0bEkeXb2D0i3h0Re4p5lMMuM1ELk2M/zyIjO5IpJroDmN0esH2opJ9I2mLQncFSZSZqYXLs51lklEcy91MpjfRvKqWRDpj66QAwVF4kaRWVc9lDqscYXgdIukvSFSqja68abHdGFvt5FhnZxB/b20bElYPuBwBgdJGoNTPYz7PTKN8uf63tT0j6lqRTIuIPg+4QAEyX7fMi4jlJbZ0QEQdltIUHIVFrZrCfZ6GRDTIj4kDbK0p6rqSLbP9O0lERcf6AuwYA03G97b0i4qyEtmx7m4i4KqEt9KmStGT7wojYvfdz2/MH16vRw36enUY2yLS9naT9VRYe/lr173hJBJkAZoPMyiYrSbrA9nf72jowqZ8oSNSaGeznWWSU52R+SdLJki6K6kXa3i0i+NQDYJli+5Hjf0bFn1y211Kp0LSJpJslfT4i/jzQTo0g9vPsMrJB5ni214+IPw66HwAwFdvLSXqxykjmQyQtknSlpK+1qVpm+9mSNpd0Q0RckNFXAJjKyAaZtt+vUgVg9epH90TElgPsEgAsle0TVZYuukzSvZLmqCQ23BsRjZZrsX2MpLUk/UjS9pLujIjDUjoMSbmJWpgc+3l2Gdk5mZJ2lLSDpC9IerVKljkADLtHRcT4bNljbV/Wos2nRMTO1fefs82SL/kyE7UwOfbzLDLKQaYkbaUykrmlpLkD7gsATMdC20epFJO4R+UctoukP7Voc5Htl2psJPOutp3Eg2QmamFy7OdZZJRvl6+vUq/8fkmHSfp2RJw+2F4BwNRsz5F0iKTtVNYAXKQSHB4bEfc0bHMdSe9Sycb9qcpybnfm9BgAJjaSQabtx0t6usocpDskzY+IXw60UwAww2xvFxE/GnQ/RllXiVpYEvt5dhq52uW2Xy3pPJVb5HNUbpl/1zZ1fwEsaz4kSbZZH7g7J0jaS9JvJF0u6beS9ql+jjzs51loFOdkHixpq4i4o/eDal2tc1UWZAeAZcVKtg+UtKHtA/r/IyJOHlCfRk0XiVp4MPbzLDSKQeaKkh5j2+N+vvIgOgMAA7S/ytShnvHnRbTXRaIWHoz9PAuN3JzMao25CTVdYw4AZjPbH4yIdw+6H6Ooi0QtPBj7eXYauSATAAAAgzdyiT8AAAAYPIJMAAAApCPIBAAAQDqCTAAAAKQjyASAZYjtTarqKQDQqVFcJxMA0Mf2ZyVdoFIJ7RmSbpW070A7BWDk8WkWAEbf4yLiDEnbR8SOkuYNukMARh9BJgCMvn/Y/rikG21vK+nvA+4PgGUAi7EDwIizPVfSTpLOk/RUSb+OiJsH2ysAo44gEwBGnO1tI+LKQfcDwLKF2+UAMPpea/sK2++yveGgOwNg2UCQCQAjLiIOlPR0ST+XdJHtC20/a8DdAjDiuF0OACPO9naS9leZj3m+pK9JOj4ithloxwCMNIJMABhxtr8k6WRJF0V10re9W0TMH2jHAIw0gkwAWMbYXj8i/jjofgAYbVT8AYARZ/v9kp4vafXqR/eoVP8BgM6Q+AMAo29HSTtIulIluFw42O4AWBYQZALAsmErlZHMLSXNHXBfACwDmJMJACPO9voq9crvl3SYpG9HxOmD7RWAUUeQCQAjzPbjVdbIXEvSHZLmR8QvB9opAMsEbpcDwIiy/WqVeuVbSpqjcsv8u7ZfNNCOAVgmMJIJACPK9pWS9oiIO/p+tpakcyNih4F1DMAygSWMAGB0rSjpMbY97ucrD6IzAJYtBJkAMLqulXTwBD//yQz3A8AyiNvlAAAASEfiDwAAANIRZAIAACAdQSYAdMj2d2zPsb1c9fh7tpe3vXz12LbPsr2h7ZXGbbtC73kAMNsQZAJAB2w/yvZ7JP0jIu6RtLvt+ZKeJOm7ki6ogsr3qiTorCrpItsX277b9sWSLpK050BeAAC0RHY5AHRjS0nPlLSh7c9KOjQizrP97Yj4Z0myvZekLSR9VtIDEfG06ufXRsQuA+o3AKRgJBMAurGDpNdK+llEvE7Sf1YjmavZ/pXtqyQ9WtLrJX1cUth+an8D1W318WtcAsCswBJGANAB23Ml/VnStySdLOmsiLiv+r8jJJ0j6RpJp0n6U0S8sRrBfKLtayXdrnK36ZCIuH4ALwEAWuF2OQB0Y11J75K0o6TvSXqT7edKCkkbS3qepOMl/a1vm//rfRMRz5yxngJABwgyAaAby0k6S9KjI+KTtleUpIg4uhrJ/Lak6yVdLelfBtVJAOgKczIBoAMRcX1EXNL3+O+SHmZ7Tt/P7pNklZWMVhrfRjUnc8UZ6TAAJCPIBIBurWh7ZUmKiLdLermkPST9ofr/lSWtobKs0d22L5B0e/X1fElvm/kuA0B7JP4AwAyy7eDEC2AZQJAJAACAdNwuBwAAQDqCTAAAAKQjyAQAAEA6gkwAAACkI8gEAABAuv8P7Y2ksVP/baQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 720x360 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "feat_importance=feat_importance.sort_values(by='重要程度',ascending=False)\n",
    "sns.catplot(x='特征',y='重要程度',data=feat_importance,kind='bar',height=5,aspect=2)\n",
    "plt.xticks(rotation=90)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 212,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('预测是否欺诈订单fraud',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.9782289560424341,\n",
       " ('预测是否延迟订单late',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.988809794199928,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.880591640583885,\n",
       " ('预测是否延迟订单late',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.5718915325596211,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.9776749854582721,\n",
       " ('预测是否延迟订单late',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.988809794199928,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.990776389773703,\n",
       " ('预测是否延迟订单late',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.9941556103370911}"
      ]
     },
     "execution_count": 212,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "accuracy_list#看看模型决策任务 以及准确率"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.5神经网络keras"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "metadata": {},
   "outputs": [],
   "source": [
    "#用一下神经网络分类  这里用一下keras\n",
    "import tensorflow.keras as keras\n",
    "from tensorflow.keras import Sequential\n",
    "from tensorflow.keras.layers import Dense"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 233,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /home/stu_15527388015/.local/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Colocations handled automatically by placer.\n",
      "WARNING:tensorflow:From /home/stu_15527388015/.local/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use tf.cast instead.\n",
      "Epoch 1/10\n",
      "144408/144408 [==============================] - 4s 30us/sample - loss: 0.2163 - acc: 0.9775\n",
      "Epoch 2/10\n",
      "144408/144408 [==============================] - 4s 26us/sample - loss: 0.0532 - acc: 0.9775\n",
      "Epoch 3/10\n",
      "144408/144408 [==============================] - 4s 25us/sample - loss: 0.0502 - acc: 0.9775\n",
      "Epoch 4/10\n",
      "144408/144408 [==============================] - 4s 25us/sample - loss: 0.0485 - acc: 0.9775\n",
      "Epoch 5/10\n",
      "144408/144408 [==============================] - 4s 26us/sample - loss: 0.0467 - acc: 0.9775\n",
      "Epoch 6/10\n",
      "144408/144408 [==============================] - 4s 26us/sample - loss: 0.0463 - acc: 0.9775\n",
      "Epoch 7/10\n",
      "144408/144408 [==============================] - 4s 25us/sample - loss: 0.0440 - acc: 0.9775\n",
      "Epoch 8/10\n",
      "144408/144408 [==============================] - 4s 26us/sample - loss: 0.0418 - acc: 0.9775\n",
      "Epoch 9/10\n",
      "144408/144408 [==============================] - 4s 27us/sample - loss: 0.0399 - acc: 0.9775\n",
      "Epoch 10/10\n",
      "144408/144408 [==============================] - 4s 25us/sample - loss: 0.0369 - acc: 0.9775\n",
      "144408/144408 [==============================] - 9s 60us/sample - loss: 0.0355 - acc: 0.9775\n",
      "36103/36103 [==============================] - 2s 59us/sample - loss: 0.0511 - acc: 0.9773\n",
      "训练集准确率: [0.03548988629488649, 0.9775428]\n",
      "测试集准确率: [0.0510716830771359, 0.9773149]\n"
     ]
    }
   ],
   "source": [
    "keras.layers.BatchNormalization()\n",
    "classifier=Sequential()\n",
    "#第一层隐藏层\n",
    "classifier.add(Dense(1024,activation='relu',kernel_initializer='random_normal',input_dim=34))#特征是34个 所以这里是34个\n",
    "#第二层隐藏层\n",
    "classifier.add(Dense(512,activation='relu',kernel_initializer='random_normal'))\n",
    "#第三层隐藏层\n",
    "classifier.add(Dense(256,activation='relu',kernel_initializer='random_normal'))\n",
    "#第4层隐藏层\n",
    "classifier.add(Dense(128,activation='relu',kernel_initializer='random_normal'))\n",
    "#第5层隐藏层\n",
    "classifier.add(Dense(64,activation='relu',kernel_initializer='random_normal'))\n",
    "#第6层隐藏层\n",
    "classifier.add(Dense(32,activation='relu',kernel_initializer='random_normal'))\n",
    "#第7层隐藏层\n",
    "classifier.add(Dense(16,activation='relu',kernel_initializer='random_normal'))\n",
    "#第8层隐藏层\n",
    "classifier.add(Dense(8,activation='relu',kernel_initializer='random_normal'))\n",
    "#第9层隐藏层\n",
    "classifier.add(Dense(4,activation='relu',kernel_initializer='random_normal'))\n",
    "#第10层隐藏层\n",
    "classifier.add(Dense(2,activation='relu',kernel_initializer='random_normal'))\n",
    "#输出层\n",
    "classifier.add(Dense(1,activation='sigmoid',kernel_initializer='random_normal'))\n",
    "\n",
    "#定义优化器,损失函数\n",
    "classifier.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])\n",
    "#模型训练 欺诈\n",
    "classifier.fit(x_fraud_train,y_fraud_train,batch_size=512,epochs=10)#把训练集的特征和标签扔进去进行训练\n",
    "#模型预测  分别拿训练集和测试集的数据进行评测  七杂\n",
    "train_evaluate=classifier.evaluate(x_fraud_train,y_fraud_train)\n",
    "test_evaluate=classifier.evaluate(x_fraud_test,y_fraud_test)\n",
    "print(\"训练集准确率:\",train_evaluate)\n",
    "print(\"测试集准确率:\",test_evaluate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 234,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('预测是否欺诈订单fraud',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.9782289560424341,\n",
       " ('预测是否延迟订单late',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.988809794199928,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.880591640583885,\n",
       " ('预测是否延迟订单late',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.5718915325596211,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.9776749854582721,\n",
       " ('预测是否延迟订单late',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.988809794199928,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.990776389773703,\n",
       " ('预测是否延迟订单late',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.9941556103370911}"
      ]
     },
     "execution_count": 234,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#神经网络的欺诈的准确率是97.7%左右\n",
    "accuracy_list#好像不一定比机器学习的效果好"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.6模型融合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.4655426480468948 0.021003821875596004 LR\n",
      "0.8031047763627761 0.03281942079770072 DT\n",
      "0.7909177422620084 0.03677609873177983 Voting\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import VotingClassifier\n",
    "from sklearn.model_selection import cross_val_score\n",
    "#硬投票\n",
    "eclf=VotingClassifier( estimators=[('LR',model_late_LR),('DCT',model_fruad_DT)],voting='soft' )\n",
    "for clf,label in zip([model_late_LR,model_fruad_DT,eclf],['LR','DT','Voting']):\n",
    "    scores=cross_val_score(clf,x_fraud,y_fraud,cv=5,scoring='roc_auc')#5折交叉验证\n",
    "    print(scores.mean(),scores.std(),label)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:fraud --------------------------------------------------------------------------------\n",
      "使用的模型：LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,\n",
      "                           solver='svd', store_covariance=False, tol=0.0001)\n",
      "准确率：0.9787275295681799\n",
      "召回率：0.5415986949429038\n",
      "Auc：0.7639382598411335\n",
      "F1-score：0.46368715083798884\n",
      "混淆矩阵：\n",
      "[[35003   487]\n",
      " [  281   332]]\n",
      "预测:fraud --------------------------------------------------------------------------------\n",
      "预测:late --------------------------------------------------------------------------------\n",
      "使用的模型：LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,\n",
      "                           solver='svd', store_covariance=False, tol=0.0001)\n",
      "准确率：0.9841287427637593\n",
      "召回率：0.9776763567814267\n",
      "Auc：0.9849811699241094\n",
      "F1-score：0.9856710595413739\n",
      "混淆矩阵：\n",
      "[[15822   123]\n",
      " [  450 19708]]\n",
      "预测:late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#LDA模型\n",
    "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
    "model_fraud_LDA=LinearDiscriminantAnalysis()\n",
    "model_late_LDA=LinearDiscriminantAnalysis()\n",
    "model_stats(model_fraud_LDA,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'fraud')\n",
    "model_stats(model_late_LDA,x_late_train,x_late_test,y_late_train,y_late_test,'late')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.7随机森林"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:fraud --------------------------------------------------------------------------------\n",
      "使用的模型：RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
      "                       criterion='gini', max_depth=None, max_features='auto',\n",
      "                       max_leaf_nodes=None, max_samples=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
      "                       n_jobs=None, oob_score=False, random_state=None,\n",
      "                       verbose=0, warm_start=False)\n",
      "准确率：0.9891144780212171\n",
      "召回率：0.9819004524886877\n",
      "Auc：0.9855521723479305\n",
      "F1-score：0.6883425852498017\n",
      "混淆矩阵：\n",
      "[[35276   385]\n",
      " [    8   434]]\n",
      "预测:fraud --------------------------------------------------------------------------------\n",
      "预测:late --------------------------------------------------------------------------------\n",
      "使用的模型：RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
      "                       criterion='gini', max_depth=None, max_features='auto',\n",
      "                       max_leaf_nodes=None, max_samples=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
      "                       n_jobs=None, oob_score=False, random_state=None,\n",
      "                       verbose=0, warm_start=False)\n",
      "准确率：0.9918012353544027\n",
      "召回率：0.985293386992597\n",
      "Auc：0.9926466934962985\n",
      "F1-score：0.9925922218329246\n",
      "混淆矩阵：\n",
      "[[15976     0]\n",
      " [  296 19831]]\n",
      "预测:late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#随机森林\n",
    "model_fraud_RF=RandomForestClassifier()\n",
    "model_late_RF=RandomForestClassifier()\n",
    "model_stats(model_fraud_RF,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'fraud')\n",
    "model_stats(model_late_RF,x_late_train,x_late_test,y_late_train,y_late_test,'late')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.8Xgboost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:fraud --------------------------------------------------------------------------------\n",
      "使用的模型：RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
      "                       criterion='gini', max_depth=None, max_features='auto',\n",
      "                       max_leaf_nodes=None, max_samples=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
      "                       n_jobs=None, oob_score=False, random_state=None,\n",
      "                       verbose=0, warm_start=False)\n",
      "准确率：0.9890590809628009\n",
      "召回率：0.9774774774774775\n",
      "Auc：0.9833403820826351\n",
      "F1-score：0.6872525732383216\n",
      "混淆矩阵：\n",
      "[[35274   385]\n",
      " [   10   434]]\n",
      "预测:fraud --------------------------------------------------------------------------------\n",
      "预测:late --------------------------------------------------------------------------------\n",
      "使用的模型：RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
      "                       criterion='gini', max_depth=None, max_features='auto',\n",
      "                       max_leaf_nodes=None, max_samples=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, n_estimators=100,\n",
      "                       n_jobs=None, oob_score=False, random_state=None,\n",
      "                       verbose=0, warm_start=False)\n",
      "准确率：0.991884330942027\n",
      "召回率：0.9854402703239913\n",
      "Auc：0.9927201351619956\n",
      "F1-score：0.9926667500938556\n",
      "混淆矩阵：\n",
      "[[15979     0]\n",
      " [  293 19831]]\n",
      "预测:late --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#Xgboost\n",
    "import xgboost as xgb\n",
    "model_fraud_RF=RandomForestClassifier()\n",
    "model_late_RF=RandomForestClassifier()\n",
    "model_stats(model_fraud_RF,x_fraud_train,x_fraud_test,y_fraud_train,y_fraud_test,'fraud')\n",
    "model_stats(model_late_RF,x_late_train,x_late_test,y_late_train,y_late_test,'late')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('预测是否欺诈订单fraud',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.7681355440013277,\n",
       " ('预测是否延迟订单late',\n",
       "  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
       "                     intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
       "                     multi_class='auto', n_jobs=None, penalty='l2',\n",
       "                     random_state=None, solver='lbfgs', tol=0.0001, verbose=0,\n",
       "                     warm_start=False)): 0.9900172967630343,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.5798245614035088,\n",
       " ('预测是否延迟订单late',\n",
       "  GaussianNB(priors=None, var_smoothing=1e-09)): 0.7809958341598889,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.7531908690724671,\n",
       " ('预测是否延迟订单late',\n",
       "  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,\n",
       "            intercept_scaling=1, loss='squared_hinge', max_iter=1000,\n",
       "            multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,\n",
       "            verbose=0)): 0.9900172967630343,\n",
       " ('预测是否欺诈订单fraud',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.892430906940095,\n",
       " ('预测是否延迟订单late',\n",
       "  DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
       "                         max_depth=None, max_features=None, max_leaf_nodes=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, presort='deprecated',\n",
       "                         random_state=None, splitter='best')): 0.9940688328469708,\n",
       " ('fraud',\n",
       "  LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,\n",
       "                             solver='svd', store_covariance=False, tol=0.0001)): 0.7639382598411335,\n",
       " ('late',\n",
       "  LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,\n",
       "                             solver='svd', store_covariance=False, tol=0.0001)): 0.9849811699241094,\n",
       " ('fraud',\n",
       "  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
       "                         criterion='gini', max_depth=None, max_features='auto',\n",
       "                         max_leaf_nodes=None, max_samples=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, n_estimators=100,\n",
       "                         n_jobs=None, oob_score=False, random_state=None,\n",
       "                         verbose=0, warm_start=False)): 0.9855521723479305,\n",
       " ('late',\n",
       "  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
       "                         criterion='gini', max_depth=None, max_features='auto',\n",
       "                         max_leaf_nodes=None, max_samples=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, n_estimators=100,\n",
       "                         n_jobs=None, oob_score=False, random_state=None,\n",
       "                         verbose=0, warm_start=False)): 0.9926466934962985,\n",
       " ('fraud',\n",
       "  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
       "                         criterion='gini', max_depth=None, max_features='auto',\n",
       "                         max_leaf_nodes=None, max_samples=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, n_estimators=100,\n",
       "                         n_jobs=None, oob_score=False, random_state=None,\n",
       "                         verbose=0, warm_start=False)): 0.9833403820826351,\n",
       " ('late',\n",
       "  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
       "                         criterion='gini', max_depth=None, max_features='auto',\n",
       "                         max_leaf_nodes=None, max_samples=None,\n",
       "                         min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                         min_samples_leaf=1, min_samples_split=2,\n",
       "                         min_weight_fraction_leaf=0.0, n_estimators=100,\n",
       "                         n_jobs=None, oob_score=False, random_state=None,\n",
       "                         verbose=0, warm_start=False)): 0.9927201351619956}"
      ]
     },
     "execution_count": 240,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#从auc的角度来看看谁更好\n",
    "auc_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 7回归任务：对销售额Sales以及订单数量Order Item Quantity预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.1准备特征和标签"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "35\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Index(['Type', 'Days for shipping (real)', 'Days for shipment (scheduled)',\n",
       "       'Benefit per order', 'Category Id', 'Category Name', 'Customer City',\n",
       "       'Customer Country', 'Customer Id', 'Customer Segment', 'Customer State',\n",
       "       'Customer Zipcode', 'Department Id', 'Department Name', 'Market',\n",
       "       'Order City', 'Order Country', 'Order Id', 'Order Item Discount',\n",
       "       'Order Item Discount Rate', 'Order Item Product Price',\n",
       "       'Order Item Profit Ratio', 'Order Item Quantity', 'Sales',\n",
       "       'Order Region', 'Order State', 'Product Name', 'Shipping Mode',\n",
       "       'Customer Full Name', 'order_year', 'order_month', 'order_weekday',\n",
       "       'order_hour', 'fraud', 'late_delivery'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 242,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#准备对销售字段进行预测 即sales字段    还有对 #定订单数量进行预测，即Order ItemQuantity\n",
    "#在此之前看看 train_data里面有哪些字段\n",
    "print(len(train_data.columns))#特征是35个  跟上面的结果是一样的\n",
    "train_data.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 243,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(180511, 34)\n",
      "(180511,)\n",
      "(180511, 34)\n",
      "(180511,)\n"
     ]
    }
   ],
   "source": [
    "#对 销售额 进行预测,即sales字段\n",
    "x_sales=train_data.loc[:,train_data.columns!='Sales']\n",
    "y_sales=train_data['Sales']#这样就筛选好了关于 销售字段 Sales和特征\n",
    "print(x_sales.shape)\n",
    "print(y_sales.shape)\n",
    "\n",
    "#商品订单数量进行预测，即Order ItemQuantity\n",
    "x_quantity=train_data.loc[:,train_data.columns!='Order Item Quantity']\n",
    "y_quantity=train_data['Order Item Quantity']\n",
    "print(x_quantity.shape)\n",
    "print(y_quantity.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.2数据集切分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 244,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(144408, 34)\n",
      "(36103, 34)\n",
      "(144408,)\n",
      "(36103,)\n",
      "(144408, 34)\n",
      "(36103, 34)\n",
      "(144408,)\n",
      "(36103,)\n"
     ]
    }
   ],
   "source": [
    "#数据集切分\n",
    "x_sales_train,x_sales_test,y_sales_train,y_sales_test=train_test_split(x_sales,y_sales,test_size=0.2)\n",
    "print(x_sales_train.shape)\n",
    "print(x_sales_test.shape)\n",
    "print(y_sales_train.shape)\n",
    "print(y_sales_test.shape)\n",
    "x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test=train_test_split(x_quantity,y_quantity,test_size=0.2)\n",
    "print(x_quantity_train.shape)\n",
    "print(x_quantity_test.shape)\n",
    "print(y_quantity_train.shape)\n",
    "print(y_quantity_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.3模型预测和评估"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import mean_absolute_error,mean_squared_error\n",
    "#写一个函数 把模型和训练集测试集喂入 然后训练和打印出模型的效果\n",
    "def regression_model_stats(model,x_train,x_test,y_train,y_test,model_name='Sales'):\n",
    "    model=model.fit(x_train,y_train)#用训练集进行模型的训练\n",
    "    y_pred=model.predict(x_test)#训练好的模型用测试集预测\n",
    "    print('预测:{}'.format(model_name),'-'*80)\n",
    "    print('使用的模型:{}'.format(model))\n",
    "    mae=mean_absolute_error(y_test,y_pred)#用预测值和测试集的值做差\n",
    "    mse=mean_squared_error(y_test,y_pred)\n",
    "    print('平均绝对误差(MAE):{}'.format(mae))\n",
    "    print('均方误差(MSE):{}'.format(mse))\n",
    "    print('预测:{}'.format(model_name),'-'*80)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.1线性回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 246,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)\n",
      "平均绝对误差(MAE):18.339871509055907\n",
      "均方误差(MSE):948.8802856886757\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)\n",
      "平均绝对误差(MAE):0.3465300997560983\n",
      "均方误差(MSE):0.2788238813723401\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#线性回归\n",
    "from sklearn.linear_model import LinearRegression\n",
    "model_sale_LN=LinearRegression()\n",
    "model_quantity_LN=LinearRegression()\n",
    "regression_model_stats(model_sale_LN,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_LN,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.2Lasso回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 247,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,\n",
      "      normalize=False, positive=False, precompute=False, random_state=None,\n",
      "      selection='cyclic', tol=0.0001, warm_start=False)\n",
      "平均绝对误差(MAE):18.669408974059913\n",
      "均方误差(MSE):1034.2575731395295\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,\n",
      "      normalize=False, positive=False, precompute=False, random_state=None,\n",
      "      selection='cyclic', tol=0.0001, warm_start=False)\n",
      "平均绝对误差(MAE):0.36301637116845953\n",
      "均方误差(MSE):0.301405823167176\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#lasso\n",
    "from sklearn.linear_model import Lasso\n",
    "model_sale_Lasso=Lasso()\n",
    "model_quantity_Lasso=Lasso()\n",
    "regression_model_stats(model_sale_Lasso,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_Lasso,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.3Ridge回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 248,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,\n",
      "      normalize=False, random_state=None, solver='auto', tol=0.001)\n",
      "平均绝对误差(MAE):18.335169699545034\n",
      "均方误差(MSE):948.8615227069722\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,\n",
      "      normalize=False, random_state=None, solver='auto', tol=0.001)\n",
      "平均绝对误差(MAE):0.34652657586295216\n",
      "均方误差(MSE):0.278824352484616\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#ridge\n",
    "from sklearn.linear_model import Ridge\n",
    "model_sale_Ridge=Ridge()\n",
    "model_quantity_Ridge=Ridge()\n",
    "regression_model_stats(model_sale_Ridge,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_Ridge,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.4回归树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 249,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,\n",
      "                      max_features=None, max_leaf_nodes=None,\n",
      "                      min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                      min_samples_leaf=1, min_samples_split=2,\n",
      "                      min_weight_fraction_leaf=0.0, presort='deprecated',\n",
      "                      random_state=None, splitter='best')\n",
      "平均绝对误差(MAE):0.0008301249650290871\n",
      "均方误差(MSE):0.024878845855690723\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,\n",
      "                      max_features=None, max_leaf_nodes=None,\n",
      "                      min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                      min_samples_leaf=1, min_samples_split=2,\n",
      "                      min_weight_fraction_leaf=0.0, presort='deprecated',\n",
      "                      random_state=None, splitter='best')\n",
      "平均绝对误差(MAE):0.0\n",
      "均方误差(MSE):0.0\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "#回归树\n",
    "model_sale_DCT=DecisionTreeRegressor()\n",
    "model_quantity_DCT=DecisionTreeRegressor()\n",
    "regression_model_stats(model_sale_DCT,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_DCT,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.5Xgboost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 250,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
      "             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
      "             importance_type='gain', interaction_constraints='',\n",
      "             learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
      "             min_child_weight=1, missing=nan, monotone_constraints='()',\n",
      "             n_estimators=100, n_jobs=0, num_parallel_tree=1,\n",
      "             objective='reg:squarederror', random_state=0, reg_alpha=0,\n",
      "             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\n",
      "             validate_parameters=1, verbosity=None)\n",
      "平均绝对误差(MAE):0.010225719073793733\n",
      "均方误差(MSE):0.004750591125912661\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
      "             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
      "             importance_type='gain', interaction_constraints='',\n",
      "             learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
      "             min_child_weight=1, missing=nan, monotone_constraints='()',\n",
      "             n_estimators=100, n_jobs=0, num_parallel_tree=1,\n",
      "             objective='reg:squarederror', random_state=0, reg_alpha=0,\n",
      "             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\n",
      "             validate_parameters=1, verbosity=None)\n",
      "平均绝对误差(MAE):4.619516151002829e-05\n",
      "均方误差(MSE):2.179873912882469e-07\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "model_sale_XGBR=xgb.XGBRegressor()\n",
    "model_quantity_XGBR=xgb.XGBRegressor()\n",
    "regression_model_stats(model_sale_XGBR,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_XGBR,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.6lightgbm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 251,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,\n",
      "              importance_type='split', learning_rate=0.1, max_depth=-1,\n",
      "              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,\n",
      "              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,\n",
      "              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,\n",
      "              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)\n",
      "平均绝对误差(MAE):0.12335149258782857\n",
      "均方误差(MSE):2.235527613905705\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,\n",
      "              importance_type='split', learning_rate=0.1, max_depth=-1,\n",
      "              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,\n",
      "              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,\n",
      "              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,\n",
      "              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)\n",
      "平均绝对误差(MAE):0.0003912786021457423\n",
      "均方误差(MSE):1.0607813281283715e-05\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "import lightgbm as lgb\n",
    "model_sale_lgbR=lgb.LGBMRegressor()\n",
    "model_quantity_lgbR=lgb.LGBMRegressor()\n",
    "regression_model_stats(model_sale_lgbR,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_lgbR,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3.7随机森林"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "预测:sales --------------------------------------------------------------------------------\n",
      "使用的模型:RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n",
      "                      max_depth=None, max_features='auto', max_leaf_nodes=None,\n",
      "                      max_samples=None, min_impurity_decrease=0.0,\n",
      "                      min_impurity_split=None, min_samples_leaf=1,\n",
      "                      min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
      "                      n_estimators=100, n_jobs=None, oob_score=False,\n",
      "                      random_state=None, verbose=0, warm_start=False)\n",
      "平均绝对误差(MAE):0.0007241585364883126\n",
      "均方误差(MSE):0.0074285638039471615\n",
      "预测:sales --------------------------------------------------------------------------------\n",
      "预测:quantity --------------------------------------------------------------------------------\n",
      "使用的模型:RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n",
      "                      max_depth=None, max_features='auto', max_leaf_nodes=None,\n",
      "                      max_samples=None, min_impurity_decrease=0.0,\n",
      "                      min_impurity_split=None, min_samples_leaf=1,\n",
      "                      min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
      "                      n_estimators=100, n_jobs=None, oob_score=False,\n",
      "                      random_state=None, verbose=0, warm_start=False)\n",
      "平均绝对误差(MAE):5.262720549538732e-06\n",
      "均方误差(MSE):9.140514638672445e-08\n",
      "预测:quantity --------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "model_sale_RF=RandomForestRegressor()\n",
    "model_quantity_RF=RandomForestRegressor()\n",
    "regression_model_stats(model_sale_RF,x_sales_train,x_sales_test,y_sales_train,y_sales_test,'sales')\n",
    "regression_model_stats(model_quantity_RF,x_quantity_train,x_quantity_test,y_quantity_train,y_quantity_test,'quantity')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
